nonparametric covariance estimation in functional mapping...

NONPARAMETRIC COVARIANCE ESTIMATION IN FUNCTIONAL MAPPINGOF COMPLEX DYNAMIC TRAITS

By

JOHN STEPHEN F. YAP

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2008

1

c© 2008 John Stephen F. Yap

2

To my mother, Rhoda.

3

ACKNOWLEDGMENTS

I realized that I have been in school for most of my life and this dissertation is the

culmination of my formal education - but certainly not the end to learning. I would like to

thank everyone who has contributed to the accumulation of my knowledge and honing of

my skills, those who have helped me in all aspects of my career, and all who have affected

my life. In particular, thanks to

The people in my early years of education: Mabel Nakasas, my first tutor, for her tireless

effort even when I was daydreaming or falling asleep while she was teaching me Math;

Rico Santos who gave us the opportunity to better our Math skills; Mr. and Mrs. Yeban

for all their support and mentorship; Dr. Aurello Ramos, Jr. for giving me a job at LSC;

Dr. Augusto Hermosilla for all his precious pieces of advice regarding my career; all my

colleagues at the Ateneo de Manila University Math Department for all their friendship

and support.

All my recommenders: Dr. Jose Marasigan, Dr. Reginald Marcelo, and Dr. Gerry Salas

(for initial admission to graduate school); Dr. Stephen Agard, Dr. Dennis Cook, Dr. Chris

Bingham, and Dr. John Baxter (for admission to the Ph.D. program in Statistics at the

University of Florida); Dr. Rongling Wu, Dr. James Hobert, Dr. Mark Yang, and Dr.

Wendy London (for job applications).

Dr. Wendy London for giving me the opportunity to work at COG and learn about

children’s cancer and my COG colleagues Patrick McGrady, Chenguang Wang, and

Stephen Linda.

My colleagues, officemates and friends in the Statistics Department at UF: Aixin Tan

who has helped me a lot in my statistics career, Song Wu for all his help in statistical

4

genetics, Yao Li, Vivekananda Roy, Ruitao Liu, Bong-Rae Kim, Jie Yang, Jiahan Li,

Tezcan Orazgat, Tian Liu, Hongying Li, Wei Hou, Qin Li, Jiangtao Liu, and Guifang Fu.

Pinoy UF for their camaraderie and awesomeness.

Jordana Arzadon, Tita Rhoda and family, Tita Esther and Uncle Jim, Tita Rizza and

Manang Grace for being my surrogate family in the US.

Clouded Fury: Tobs, Alvin, Allan, and Anne.

Charlie, Junior, and Little Kitten.

Greg and the potluck gang.

My mom who has always supported me in everything that I did.

My Ph.D. Committee: Dr. Malay Ghosh, Dr. Ronald Randles, Dr. Xueli Liu, and Dr.

Wendy London.

My adviser, Dr. Rongling Wu, who has been very patient, supportive, generous, and for

being the best adviser ever!

5

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

CHAPTER

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.1 Basic Genetics and QTL Mapping . . . . . . . . . . . . . . . . . . . . . . . 141.1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.1.2 Experimental Crosses . . . . . . . . . . . . . . . . . . . . . . . . . . 151.1.3 Linkage and Markers . . . . . . . . . . . . . . . . . . . . . . . . . . 161.1.4 Interval Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.2 Functional Mapping of QTL . . . . . . . . . . . . . . . . . . . . . . . . . . 201.2.1 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.2.2 Parameter Estimation via the EM Algorithm . . . . . . . . . . . . . 231.2.3 Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.3 Other QTL Mapping Models . . . . . . . . . . . . . . . . . . . . . . . . . . 261.4 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2 NONPARAMETRIC COVARIANCE ESTIMATION IN FUNCTIONALMAPPING OF QTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.2 Covariance Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.2.1 Modified Cholesky Decomposition and Regression Interpretation . . 312.2.2 Regularized Covariance Estimators . . . . . . . . . . . . . . . . . . 332.2.3 Ridge Regression and LASSO . . . . . . . . . . . . . . . . . . . . . 352.2.4 Penalized Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.3 Covariance Estimation in Functional Mapping . . . . . . . . . . . . . . . . 412.3.1 Computing the Penalized Likelihood Estimates . . . . . . . . . . . . 412.3.2 From EM to ECM Algorithm . . . . . . . . . . . . . . . . . . . . . 442.3.3 Selection of Tuning Parameter . . . . . . . . . . . . . . . . . . . . . 46

2.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.4.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.4.2 Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

2.5 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6

3 NONPARAMETRIC COVARIANCE ESTIMATION IN FUNCTIONALMAPPING OF REACTION NORMS TO TWOENVIRONMENTAL SIGNALS . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.2 Functional Mapping of Reaction Norms to Multiple Environmental Signals 66

3.2.1 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.2.2 Mean and Covariance Models . . . . . . . . . . . . . . . . . . . . . 693.2.3 Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.3 Spatio-temporal Covariance Functions . . . . . . . . . . . . . . . . . . . . 713.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.3.2 Basic Ideas, Notation, and Assumptions . . . . . . . . . . . . . . . . 723.3.3 Separable Covariance Structures . . . . . . . . . . . . . . . . . . . . 733.3.4 Nonseparable Covariance Structures . . . . . . . . . . . . . . . . . . 75

3.3.4.1 Spectral method by Cressie and Huang (1999) . . . . . . . 753.3.4.2 Monotone function method by Gneiting (2002) . . . . . . 77

3.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.5 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4 CONCLUDING REMARKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934.0.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934.0.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

APPENDIX

A DERIVATION OF EM ALGORITHM FORMULAS . . . . . . . . . . . . . . . 97

B DERIVATION OF EQUATION 2-9 . . . . . . . . . . . . . . . . . . . . . . . . . 99

C MINIMIZATION OF 2-33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

D DEFINITION OF KRONECKER PRODUCT . . . . . . . . . . . . . . . . . . . 102

E DERIVATION OF EQUATION 3-20 . . . . . . . . . . . . . . . . . . . . . . . . 103

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7

LIST OF TABLES

Table page

1-1 Conditional genotype probability in a backcross . . . . . . . . . . . . . . . . . . 19

1-2 Conditional genotype probability in an F2 . . . . . . . . . . . . . . . . . . . . . 19

2-1 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for threeQTL genotypes in an F2 population under different sample sizes (n) based on 100simulation replicates (ΣNP , Normal Data). . . . . . . . . . . . . . . . . . . . . . . 52

2-2 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for threeQTL genotypes in an F2 population under different sample sizes (n) based on 100simulation replicates (ΣAR(1), Normal Data). . . . . . . . . . . . . . . . . . . . . . 53

2-3 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for threeQTL genotypes in an F2 population under different sample sizes (n) based on 100simulation replicates (ΣNP , Data from t-distribution). . . . . . . . . . . . . . . . . . 54

2-4 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for threeQTL genotypes in an F2 population under different sample sizes (n) based on 100simulation replicates (ΣAR(1), Data from t-distribution). . . . . . . . . . . . . . . . 55

2-5 Available markers and phenotype data of a linkage map in an F2 population ofmice (data from Vaughn et al., 1999). . . . . . . . . . . . . . . . . . . . . . . . . 59

3-1 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for twoQTL genotypes in a backcross population under different sample sizes (n) based on100 simulation replicates (Nonseparable Model). . . . . . . . . . . . . . . . . . . . . 81

3-2 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for twoQTL genotypes in a backcross population under different sample sizes (n) based on100 simulation replicates (Nonseparable Model). . . . . . . . . . . . . . . . . . . . . 82

3-3 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for twoQTL genotypes in a backcross population under different sample sizes (n) based on100 simulation replicates (ΣNP ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

8

3-4 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for twoQTL genotypes in a backcross population under different sample sizes (n) based on100 simulation replicates (ΣAR(1)). . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

3-5 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for twoQTL genotypes in a backcross population under different sample sizes (n) based on100 simulation replicates (C1 with n = 400 and σ2 = 2, 4). . . . . . . . . . . . . . . 88

3-6 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for twoQTL genotypes in a backcross population under different sample sizes (n) based on100 simulation replicates (C1 with n = 400, increased irradiance and temperature lev-els, and σ2 = 1, 2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

9

LIST OF FIGURES

Figure page

1-1 Experimental crosses from pure inbred line parents P1 and P2 . . . . . . . . . . 15

1-2 Crossing-over . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1-3 Weights of mice measured every week for 10 weeks . . . . . . . . . . . . . . . . 22

1-4 Hypothetical plot of LR vs. linkage map . . . . . . . . . . . . . . . . . . . . . . 27

2-1 Penalized likelihood in curve estimation . . . . . . . . . . . . . . . . . . . . . . 40

2-2 Log-likelihood ratio (LR) plots based on simulated data under three differentcovariance structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2-3 The profile of the log-likelihood ratios (LR) between the full model (there is aQTL) and reduced (there is no QTL) model for body mass growth trajectoriesacross the genome in a mouse F2 population . . . . . . . . . . . . . . . . . . . . 58

2-4 Log-likelihood ratio (LR) plots for chromosomes 6 and 7 of the mice data . . . . 60

2-5 Three growth curves each presenting a genotype at each of seven QTLs detectedon mouse chromosomes 1, 4, 6, 7, 10, 11, and 15 for growth trajectories of micein an F2 population. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3-1 Reaction norm surface of photosynthetic rate as a function of irradiance andtemperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3-2 Boxplots of the values of the log-likelihood under the alternative model, H1 . . . 85

3-3 Covariance plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3-4 Contour plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4-1 Formation of a phenotype by a landscape . . . . . . . . . . . . . . . . . . . . . . 95

10

Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy

NONPARAMETRIC COVARIANCE ESTIMATION IN FUNCTIONAL MAPPINGOF COMPLEX DYNAMIC TRAITS

By

John Stephen F. Yap

August 2008

Chair: Rongling WuMajor: Statistics

One of the fundamental objectives in agricultural, biological and biomedical

research is the identification of genes that control the developmental pattern of complex

traits, their responses to the environment, and the way these genes interact in a

coordinated manner to determine the final expression of the trait. More recently, a

new statistical framework, called functional mapping, has been developed to identify

and map quantitative trait loci (QTLs) that determine developmental trajectories by

integrating biologically meaningful mathematical models of trait progression into a

mixture model for unknown QTL genotypes. Functional mapping has emerged to be a

powerful statistical tool for mapping QTLs controlling the responsiveness (reaction norm)

of a trait to developmental and environmental signals.

From a statistical perspective, functional mapping designed to study the genetic

regulation and network of quantitative variation in dynamic complex traits is virtually a

joint mean-covariance likelihood model. Appropriate choices of the model for the mean

and covariance structures are of critical importance to statistical inference about QTL

locations and actions/interactions. While a battery of statistical and mathematical models

have been proposed for mean vector modeling, the analysis of covariance structure has

been mostly limited to parametric structures like autoregressive one (AR(1)) or structured

antedependence (SAD) model. In functional mapping of reaction norms that respond

to two environmental signals, a model, expressed as a Kronecker product of two AR(1)

11

structures, has been proposed to test differences of the genetic control of responses to

different environments. For practical longitudinal data sets, parametric modeling may

be too simple to capture the complex pattern and structure of the covariance. There

is a pressing need to develop a robust approach for modeling any possible structure of

longitudinal covariance, ultimately broadening the use of functional mapping.

Our study proposes a nonparametric covariance estimator in functional mapping

of quantitative trait locus. We adopt Huang et al.’s (2006) approach of invoking the

modified Cholesky decomposition and converting the problem into modeling a sequence of

regressions of responses. A regularized positive-definite covariance estimator is obtained

using a normal penalized likelihood with an L2 penalty. This approach is embedded

within the mixture likelihood framework of functional mapping by using a reparameterized

version of the derivative of the log-likelihood. We extend the idea of functional mapping

to model the covariance structure of interaction effects between the two environmental

signals in a non-separable way. The extended model allows the quantitative test of several

fundamental biological questions. Is there a pleiotropic QTL that regulates genotypic

responses to different environmental signals? What is the difference in the timing and

duration of QTL expression between environment-specific responsiveness? How is an

environment-dependent QTL regulated by a development-related QTL? We performed

various simulation studies to reveal the statistical properties of the new models and

demonstrate the advantages of the proposed estimator. By analyzing real examples in

genetic studies, we illustrated the utilization and usefulness of the methodology. The new

methods will provide a useful tool for genome-wide scanning for the existence, distribution

and interactions of QTLs underlying a dynamic trait important to agriculture, biology and

health sciences.

12

CHAPTER 1INTRODUCTION

A number of biological traits are quantitatively inherited. Examples of such traits

include the height of trees, the weight or body mass of animals, the yield of agricultural

crops, or even disease progression and drug response. Genetic mapping of quantitative

traits and subsequent cloning of the underlying genes have become a considerable focus

in agricultural, biological, and biomedical research. Since the publication of the seminal

mapping paper by Lander and Botstein (1989), there has been a large amount of literature

concerning the development of statistical methods for mapping complex traits (reviewed

in Jansen, 2000; Hoeschele, 2000; Wu et al., 2007b). Although the idea of associating a

continuously varying phenotype with a discrete trait (marker) dates back to the work

of Sax (1923), it was Lander and Botstein (1989) who first established an explicit

principle for linkage analysis. They also provided a tractable statistical algorithm for

dissecting a quantitative trait into their individual genetic locus components, referred to as

quantitative trait loci (QTLs).

The success of Lander and Botstein in developing a powerful method for linkage

analysis of a complex trait has roots in two different developments. First, the rapid

development of molecular technologies in the middle 1980s led to the generation of a

virtually unlimited number of markers that specify the genome structure and organization

of any organism (Drayna et al., 1984). Second, almost simultaneously, improved statistical

and computational techniques, such as the EM algorithm (Dempster et al., 1977), made it

possible to tackle complex genetic and genomic problems.

Lander and Botstein’s (1989) model for interval mapping of QTLs is regarded as

appropriate for an ideal (simplified) situation, in which the segregation patterns of all

markers can be predicted on the basis of the Mendelian laws of inheritance and a trait

under study is strictly controlled by one QTL on a chromosome. This work was extended

and improved by many researchers (Jansen and Stam, 1994; Zeng, 1994; Haley et al., 1994;

13

Xu, 1996), with successful identification of so-called “outcrossing” QTLs in real-world data

sets of pigs (Andersson et al., 1994) and pine (Knott et al., 1997). A general framework

for QTL analysis in outcrossing populations was recently established by Lin et al. (2003).

While most statistical methods for QTL mapping are simply the combination between

statistics and genetics, Ma et al. (2002) pioneered a novel mapping framework in which

statistics, genetics and biology are integrated through mathematical equations to ask

and disseminate biological hypotheses. Functional mapping provides a powerful tool for

mapping and identifying QTLs that control the developmental pattern of complex traits.

This chapter provides an overview of basic genetic concepts related to QTL mapping

of complex traits. Fundamental procedures for functional mapping (Ma et al., 2002) will

be emphasized. Functional mapping is a statistical and genetic model for mapping QTLs

that underlie a complex dynamic trait. This chapter is organized as follows: Section 1.1

introduces basic genetic concepts and describe how QTL mapping, via interval mapping,

is done using the idea of linkage in structured populations called experimental crosses.

Section 1.2, introduces the functional mapping model. Section 1.3 describes a few other

QTL mapping methods and finally, Section 1.4 states the main goals of this dissertation

and gives the outline of the rest of the chapters.

1.1 Basic Genetics and QTL Mapping

1.1.1 Terminology

A gene exists in alternate forms or alleles. The alleles usually occur as pairs in specific

locations in a chromosome and are transmitted from one generation to another. An

offspring (or progeny) inherits one allele from each parent and the two alleles together

constitute the offspring’s genotype. The usual textbook notation for a particular gene is a

letter in the alphabet with capital and lowercase to denote the alleles. Thus, with alleles

14

0.5

720.5

P1 P

1 P

2 P

2

F1 F

1 F

1

F2

Figure 1-1. Experimental crosses from pure inbred line parents P1 and P2: F1 × P1 orF1 × P2 produces a backcross while F1 × F1 produces an intercross or F2.(Adapted from Broman, 1997).

’A’ and ’a’, the possible genotypes are AA, Aa and aa. The genotype determines the trait

or phenotype. Variation due to a QTL results from phenotypes determined by different

genotypes. However, because environmental factors also contribute to the total phenotypic

variation, it is difficult to infer an offspring’s genotype from its phenotype.

1.1.2 Experimental Crosses

The identification of a QTL can be done through experimental crosses. The simplest

experimental cross, called backcross, starts by mating two pure inbred line parents (P1

and P2 with genotypes AA and aa, respectively) that differ in the phenotype under

consideration (Figure 1-1). Each parent contributes a chromosome strand to create an

offspring called the first filial or F1 which has genotype Aa. If the F1 is mated to one of its

parents, say P2, the offspring is called a backcross, with genotype either Aa or aa. During

meiosis (the production of sex cells or gametes), each parental strand replicates and

exchange genetic material with other strands. This exchange is known as crossing-over.

15

Thus, the strand contributed by the F1 parent to produce the backcross contains alleles

from both P1 and P2. If the F1 is selfed (mated with itself) or mated with another F1,

the offspring is called F2, with possible genotypes AA,Aa or aa. This experiment is called

intercross.

1.1.3 Linkage and Markers

Suppose we consider another gene, with alleles ’B’ and ’b’, on P1 and P2. The

genotypes in this case are AABB and aabb, respectively. Crossing-over during meiosis

is illustrated in Figure 1-2. Each chromosome strand from P1 and P2 pair up and then

replicates to form a tetrad. Crossing-over, as illustrated here, occurs at one point between

the inner strands and involves an exchange of genetic material. The end result is four

chromosomes that either resemble the parental strands (nonrecombinant, NR) or not

(recombinant, R). If crossing-over does occur, it can also do so more than once and include

the other two outer strands. In general, recombinant strands are formed when there is an

odd number of crossover points.

Genes on the same chromosome have an association called linkage. The tendency

during meiosis is for the genes to remain on the same strand. This means that there

will be more nonrecombinant than recombinant chromosomes. If r is the recombina-

tion fraction or the proportion of recombinant chromosomes, then the proportion of

nonrecombinant chromosomes is 1− r. Because of linkage, it is generally true that r < 1/2.

The value of r depends on the distance between gene loci. Genes that are far apart usually

have high values of r because of the large portion of chromosome in between allowing

for a better chance for crossing-over to occur. If two genes are very close to each other,

there is a high possibility that no crossing-over will occur and they will end up in the same

chromosome.

Linkage provides a way of locating a QTL in a chromosome by using known or

identified genes called genetic or molecular markers. Markers do not affect the QTL’s

phenotype directly and as such they are said to be phenotypically neutral. But they

16

0.5

720.5

P1 P

2

Figure 1-2. Crossing-over: (1) Parental chromosome strands align (2) Each strandreplicates to form a tetrad, crossing-over starts between inner strands (3)Recombinant (R) or nonrecombinant (NR) gametes

.

may affect other visible phenotypes such as eye color, making it possible to distinguish

their genotypes. If a marker is closely linked with a QTL, then both of their alleles could

possibly end up on the same chromosome in a backcross or F2 offspring. The resulting

marker genotype, which can be identified, is informative in predicting the QTL. Thus,

a prerequisite for QTL mapping is the construction of a linkage map of markers that

spans an entire chromosome or the set of all chromosomes in an organism (genome). The

more markers there are, the greater the chance of QTL detection. Some of the popularly

used markers include the restriction fragment length polymorphisms (RFLPs), amplified

fragment length polymorphisms (AFLPs), and single nucleotide polymorphisms (SNPs).

1.1.4 Interval Mapping

In this section, we shall see exactly how the marker genotypes are useful in QTL

mapping. The most popular approach in QTL mapping for experimental crosses is called

17

interval mapping (Lander and Botstein, 1989; Broman, 2001). This approach is also the

basis of function mapping. We start by discussing the concept of genetic distance.

A unit map distance, expressed in centiMorgans (cM), between two loci is defined

as the expected number of crossovers between loci in 100 meiotic products. Assuming

that crossovers occur at random and are independent of each other, the Haldane map

function (Haldane, 1919; Wu et al., 2007c) can be used to relate a distance of d cM to the

recombination frequency r in the following way:

d = −100 log(1− 2r)

2or r =

1

2(1− e−2d/100). (1–1)

The distances between markers across the genome are known and usually expressed in cM.

However, QTL mapping models utilize probabilities which are expressed in terms of r.

Thus, when a linkage map of markers are given in cM, they are converted to r using Eq.

1–1.

For simplicity, we assume a backcross population with two possible genotypes at a loci

denoted by 1 (for Aa) or 0 (for aa). Consider an interval on a chromosome with two linked

markers, M and N , as endpoints and let r be the recombination fraction between them.

We refer to M as the left marker and N as the right marker. Suppose there exists a QTL,

Q, within the marker interval. Let r1 be the recombination fraction between M and Q and

r2 between Q and N . It is easy to show using Eq. 1–1 that r = r1 + r2 − 2r1r2. The QTL

genotypes are unknown but their conditional probabilities given the marker genotypes can

be derived. These conditional probabilities are shown in Table 1-1. As an illustration, if

an offspring has genotype 1 (Mm) at marker M and 0 (nn) at marker N , then the marker

interval genotype is 10. The conditional probability that a QTL has genotype 1 (Qq) is

P (1|10) =(1− r1)r2

r.

18

Denote the QTL conditional probability by pk|i, where k is an index for genotype and i is

an index for progeny. Thus, the above example is the value for p1|i. We omit an index for

the marker interval to simplify notation.

Table 1-1. Conditional genotype probability in a backcross

Marker Interval QTL GenotypeGenotype 1 0

11 (1−r1)(1−r2)(1−r)

r1r2

(1−r)

10 (1−r1)r2

rr1(1−r2)

r

01 r1(1−r2)r

(1−r1)r2

r

00 r1r2

(1−r)(1−r1)(1−r2)

(1−r)

For an intercross or F2 population, there are three possible genotypes denoted by 2

(for AA), 1 (for Aa), and 0 (for aa). The conditional QTL probabilities given the marker

interval genotypes can also be derived and are shown in Table 1-2.

Table 1-2. Conditional genotype probability in an F2

Marker Interval QTL GenotypeGenotype 2 1 0

22 (1−r1)2(1−r2)2

(1−r)22r1(1−r1)r2(1−r2)

(1−r)2r21r2

2

(1−r)2

21 (1−r1)2r2(1−r2)r(1−r)

r1(1−r1)(1−2r2+2r22)

r(1−r)

r21r2(1−r2)

r(1−r)

20(1−r1)2r2

2

r2

2r1(1−r1)r2(1−r2)r2

r21(1−r2)2

r2

12 r1(1−r1)(1−r2)2

r(1−r)

(1−2r1+2r21)r2(1−r2)

r(1−r)

r1(1−r1)r22

r(1−r)

11 2r1(1−r1)r2(1−r2)(1−2r+2r2)

(1−2r1+2r21)(1−2r2+2r2

2)

(1−2r+2r2)2r1(1−r1)r2(1−r2)

(1−2r+2r2)

10r1(1−r1)r2

2

r(1−r)

(1−2r1+2r21)r2(1−r2)

r(1−r)r1(1−r1)(1−r2)2

r(1−r)

02r21(1−r2)2

r2

2r1(1−r1)r2(1−r2)r2

(1−r1)2r22

r2

01r21r2(1−r2)

r(1−r)

r1(1−r1)(1−2r2+2r22)

r(1−r)(1−r1)2r2(1−r2)

r(1−r)

00r21r2

2

(1−r)22r1(1−r1)r2(1−r2)

(1−r)2(1−r1)2(1−r2)2

(1−r)2

Tables 1-1 and 1-2 were obtained using a three-point analysis of genes (see for

example, chapter 4 of Wu et al., 2007c). A QTL is usually searched or scanned at

consecutive equidistant points or intervals in the genome. For example, a given chromosome

is searched starting at the leftmost marker to the opposite end at every 2 or 4 cM. At each

search point, the numerical values of the conditional probabilities of a QTL can be

calculated using Tables 1-1 and 1-2. These conditional probabilities form the weights

19

for the phenotype densities in the mixture model in Section 1.2.1. Notice that for a

given marker interval in either table,∑J

k=1 pk|i = 1, where J is the number of genotypes

(J = 2, 3 for a backcross and intercross, respectively). This means that all entries for a

given row in either table add up to 1.

A complete interval mapping model involves the phenotype data aside from the

markers. We shall see in the next section (1.2.1) how functional mapping, which is based

on interval mapping, incorporates phenotype data.

1.2 Functional Mapping of QTL

1.2.1 Model Formulation

Assume there are J genotypes in a mapping population of n individuals due to a

segregating QTL. Suppose each individual is measured for a trait at m equidistant points,

with the phenotype observation vector expressed as yi = (yi1, ..., yim)′ for each i = 1, ..., n.

Assuming a multivariate normal density, the phenotype of individual i with genotype k is

expressed as

fk(yi) = (2π)−m/2|Σ|−1/2 exp−(yi − gk)′Σ−1(yi − gk)/2, (1–2)

where the mean genotype value gk and covariance Σ are specified, and k = 1, ..., J .

The likelihood function can be represented by a multivariate mixture model

L(Ω) =n∏

i=1

[J∑

k=1

pk|ifk(yi|Ω)

](1–3)

where Ω is the parameter vector which we will specify shortly. pk|i is the probability of

a QTL genotype given the genotypes of two flanking markers (Section 1.1.4). As stated

earlier, a QTL is searched at different points throughout the genome. At any given

point, the numerical value of pk|i can be computed based on Tables 1-1 and 1-2. Thus,

for a given search position, the log-likelihood, log L(Ω), can be maximized to obtain the

maximum likelihood estimates (MLEs) of the mean, Ωµ, and covariance, ΩΣ, parameters.

Therefore, Ω = (Ωµ,ΩΣ).

20

Functional mapping is a general joint mean-covariance model (Pourahmadi, M., 1999,

2000; Pan and Mckenzie, 2003; Wu and Pourahmadi, 2003; Wu et al., 2007b) for QTL

mapping. Its generality stems from a wealth of possibilities in specifying the mean and

covariance structures to solve a large number of problems. With regards to the mean,

many phenomena have structural forms and can be modeled parametrically. For example,

growth (Figure 1-3) can be modeled by a logistic function defined by

gk = [gk(t)]m×1 =

[ak

1 + bke−rkt

]

m×1

(1–4)

(Niklas, 1994; West et al., 2001; Zhao et al., 2004). This model has a few desirable

descriptive properties. The curve starts with an exponential phase and reaches an

inflection point where the maximum rate of growth occurs. Then growth continues

asymptotically towards the value a. The value at the onset of growth is a/(1 + b) at t = 0

while at the point of inflection is a/2 at t = log b/r. These properties can be used to

derive hypothesis tests (Ma et al., 2002; Zhao et al., 2004). Other parametric mean models

include the sigmoid Emax equation which relates drug concentration and drug effect (Lin

et al., 2007), the Richards and Gompertzian curves for time-dependent tumor growth

(Li et al., 2006), and the biexponential model for HIV-I dynamics (Wang et al., 2006).

In the absence of structural forms, semiparametric (Cui et al., 2006; Wu et al., 2007d;

Yang et al., 2007) or nonparametric (Zhao, W., 2005a; Yang, J., 2006; Yang et al., 2007)

approaches can be used to model the mean.

The covariance, Σ, is assumed to be the same for each genotype group k. The usual

parametric model is the autoregressive one or AR(1),

Σ = σ2

1 ρ ρ2 . . . ρm−1

ρ 1 ρ . . . ρm−2

ρ2 ρ 1 . . ....

......

.... . .

...

ρm−1 ρm−2 . . . . . . 1

, (1–5)

21

1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

Age (week)

Wei

ght (

g)

Figure 1-3. Weights of mice measured every week for 10 weeks. Data from the study ofVaughn et al. (1999).

which is popular in the longitudinal data literature (Diggle et al., 2002; Verbeke and

Molenberghs, 2005). This model assumes variance stationarity (equal residual variances

σ2 at each time point) and covariance stationarity (proportionally decreasing covariances

ρ between time points). Explicit forms for the inverse and determinant are easily obtained

by matrix algebra:

Σ−1 =1

σ2

1 − ρ1−ρ2 0 . . . . . . 0

− ρ1−ρ2 1 + ρ2 − ρ

1−ρ2 . . . . . . 0

0 − ρ1−ρ2 1 + ρ2 . . . . . .

...

......

.... . . . . .

...

......

...... 1 + ρ2 − ρ

1−ρ2

0 0 . . . . . . − ρ1−ρ2 1

(1–6)

22

and

|Σ| = [σ2(1− ρ2)]m−1σ2. (1–7)

Thus, an AR(1) model is computationally efficient. Furthermore, when using the

ECM-algorithm, it is possible to obtain CM-step iteration solutions for the parameters in

logistic (Ma et al., 2002) or rational function (Yap et al., 2007) mean models. Despite the

advantages of an AR(1) model, the assumptions of variance and covariance stationarity

may not always hold, especially for real data. This is evident from Figure 1-3 where

the data appears to ’fan out’ across time instead of being stationary. Wu et al. (2004b)

tried to resolve this problem by applying a transform-both-sides (Carroll and Ruppert,

1984) log-transformation on the data to achieve approximate stationarity. The AR(1) can

then be used on the transformed data (Zhao et al., 2004). However, such transformation

may not produce a stationary covariance so that an AR(1) is still not an appropriate

model. To get rid of stationarity issues, Zhao et al. (2005) proposed using a structured

antedependence model (SAD) (Zimmerman and Nunez-Anton, 2001) which can handle

non-stationary data and is more robust and less data-dependent than AR(1). The

elements of an SAD covariance structure of order 1 are given by

var(yij) =1− φ2j

1− φjν2 and cov(yij1 , yij2) = φj2−j1

1− φ2j1

1− φjν2, 1 ≤ j1 ≤ j2 ≤ m, (1–8)

where φ is the generalized autoregressive parameter, ν is the innovation variance, and

j = 1, ..., m. 1–8 shows that the variances are not constant and the covariances are not

only dependent on |j2 − j1| but also on the reference point j1. However, Zhao et al. still

recommends modeling data by SAD in conjunction with AR(1).

1.2.2 Parameter Estimation via the EM Algorithm

The log-likelihood function, Eq. 1–3, can be written as

log L(Ω) =n∑

i=1

log

[J∑

k=1

pk|ifk(yi|Ω)

](1–9)

23

and the MLEs can be obtained by solving the likelihood or normal equation

∂

∂θlog L(Ω) = 0 (1–10)

where θ ∈ Ω. Note that for logistic mean, Ωµ = ak, bk, rk|k = 1, ..., J, and AR(1)

covariance, ΩΣ = σ2, ρ, models, Ω = (Ωµ,ΩΣ). However, the left side of Eq. 1–10 can

be reparameterized as

∂

∂θlog L(Ω) =

n∑i=1

J∑

k=1

pk|i ∂∂θ

fk(yi|Ω)∑Jk=1 pk|ifk(yi|Ω)

=n∑

i=1

J∑

k=1

pk|ifk(yi|Ω)∑Jh=1 ph|ifh(yi|Ω)

· ∂

∂θlog fk(yi|Ω)

=n∑

i=1

J∑

k=1

Pk|i∂

∂θlog fk(yi|Ω) (1–11)

where

Pk|i =pk|ifk(yi|Ω)∑J

h=1 ph|ifh(yi|Ω)(1–12)

is interpreted as the posterior probability that progeny i has QTL genotype k (McLachlan

and Peel, 2000; Ma et al., 2002). Let P = Pk|i, k = 1, ..., J ; i = 1, ..., n. The Expectation

and Maximization (EM) algorithm (Dempster et al., 1977) at the (j + 1)th iteration

proceeds as follows:

1. The current value of Ω is Ω(j).

2. E-Step. Update P(j) using Ω(j):

P(j)k|i =

pk|ifk(yi|Ω(j))∑Jh=1 ph|ifh(yi|Ω(j))

. (1–13)

3. M-Step. Solven∑

i=1

J∑

k=1

P(j)k|i

∂

∂θlog fk(yi|Ω) = 0 (1–14)

to get Ω(j+1).

4. Repeat until some convergence criterion is met.

The values at convergence are the MLEs of Ω.

24

Appendix A shows the derivation of 1–13 and 1–14 based on a missing data argument.

This derivation can also be used to show the extension from a maximum to maximum

penalized likelihood algorithm (Section 2.3.2). For a more thorough treatment on the EM

algorithm, the reader is referred to McLachlan and Peel (2000).

Although the EM algorithm provides efficient computation of the model parameters

in functional mapping, other methods can be used as well such as Newton-Raphson or

the Nelder-Mead simplex algorithm (Nelder and Mead, 1965; Zhao et al., 2004) which is a

direct nonlinear optimization procedure. These methods are particularly useful when no

closed form formulas for the parameter estimates can be obtained.

1.2.3 Hypothesis Tests

After obtaining the MLEs, we can formulate a hypothesis about the existence of a

QTL affecting genotype mean patterns as

H0 : a1 = ... = aJ , b1 = ... = bJ , r1 = ... = rJ

H1 : at least one of the inequalities above does not hold,

where H0 is the reduced (or null) model so that only a single logistic curve can fit the

phenotype data and H1 is the full (or alternative) model in which case there exist more

than one logistic curves that fit the phenotype data due to the existence of a QTL. Note

that the likelihood function corresponding to the null model is given simply by

L(Ω) =n∏

i=1

f(yi|Ω) (1–15)

where

f(yi|Ω) = (2π)−m/2|Σ|−1/2 exp−(yi − g)′Σ−1(yi − g)/2 (1–16)

and

g = [g(t)]m×1 =

[a

1 + be−rt

]

m×1

(1–17)

in the case of a logistic mean model. A number of other important hypotheses can be

tested, as outlined in Wu et al. (2004a).

25

The evidence for the the existence of a QTL can be displayed graphically using the

log-likelihood ratio (LR) test statistic

LR = −2 log

[L(Ω)

L(Ω)

](1–18)

plotted over the entire linkage map, where Ω and Ω denote the MLEs under H0 and H1,

respectively. The peak of the LR plot, which we shall from hereon refer to as maxLR,

would suggest a putative QTL because this corresponds to when H1 is the mostly likely

over H0. The distribution of LR is difficult to determine because of two major issues: the

unidentifiability of the QTL position under H0 and a multiple testing problem that arises

because tests across the genome are not mutually independent (Wu et al, 2007c). However,

a nonparametric method called permutation tests by Doerge and Churchill (1996) can be

used to find an approximate distribution and a significance threshold for the existence of

a QTL. In permutation tests, the functional mapping model is applied to several random

permutations of the phenotype data on the markers and a threshold is determined from

the set of maxLR values obtained from each permutation test run. The idea here is to

disassociate the markers and phenotypes so that repeated application of the model on

permuted data will produce an approximate empirical null distribution.

Figure 1-4 shows a hypothetical plot of the log-likelihood ratio test statistic over a

linkage map on a chromosome. The markers are spaced out at 0, 32, 46, 58, 68, 82, 100, 112cM and the QTL search was done at 2 cM intervals from the leftmost marker at 0. The

two horizontal lines are thresholds based on permutation tests. The solid red line that

crosses the LR plot suggests a significant QTL while the broken green line does not.

1.3 Other QTL Mapping Models

As mentioned in the previous section, functional mapping is based on interval

mapping. Unfortunately, interval mapping may be inadequate because only one marker

interval is considered at a time without regard to other QTLs that may exist beyond the

interval. Multiple QTL models (MQM; Jansen and Stam, 1994) and composite interval

26

0 32 46 58 68 82 100 1120

10

20

30

40

50

60

70

Location (cM)

LR

maxLR

QTL

Figure 1-4. Hypothetical plot of LR vs. linkage map. The latter consist of markers spacedout at 0,32,46,58,68,82,10,112 along the chromosome represented by thex-axis. QTL search was at 2 cM intervals. The location corresponding to thepeak of the plot, maxLR, suggests a QTL position. A threshold that crossesthe plot (red horizontal line) indicates a significant QTL; if not (greenhorizontal broken line), the QTL is not significant.

mapping (CIM; Zeng, 1994) were proposed to address this issue. To increase the power of

interval mapping, both methods use a subset of marker loci beyond the marker interval

under consideration as covariates in a partial regression analysis. The effects of the

subset of markers are used to estimate the effects of other QTLs. The problem with these

methods though is how to select the appropriate markers to include in the model.

Multiple interval mapping (MIM; Kao et al., 1999) uses multiple marker intervals

simultaneously to identify multiple QTLs. This model allows estimation of main and

epistatic (interaction) effects among all detected QTLs. However, an issue with this

method is model comparison and searching through models (Broman, 2001).

27

Other approaches include Bayesian methods (Satagopan et al., 1996; Sillanpaa and

Arjas, 1999; Xu and Yi, 2000) and the use a genetic algorithm (Carlborg et al., 2000;

Broman, 2001).

1.4 Goals

Although other QTL mapping methods have merits like being able to model multiple

QTLs and their interactions, they are mostly limited to using only single trait information

or measurements at one time point. When the trait being observed is actually affected by

a QTL over a continuous course of time, these models fall short in capturing the true QTL

dynamics. Many attempts to model this type of phenomenon are hindered by complexity

in structure and intensive computation. Fortunately, a novel approach called functional

mapping by Ma et al. (2002)(Section 1.2) provides a useful framework for genetic mapping

through mean and covariance modeling of multi- or longitudinal traits. Because it requires

a small number of model parameters to estimate, it is computationally efficient and can be

used on data that have limited sample sizes. Functional mapping has shown potential as a

powerful statistical method in QTL mapping.

Although parametric models such as AR(1) and SAD are suitable covariance

structures for the likelihood-based functional mapping, severe bias could be introduced

in the estimation process if the underlying data structure is significantly different.

Specifically, a biased covariance estimate can affect the estimates for QTL location,

QTL effects (the estimated mean model), and even the value of maxLR, which is needed

in permutation tests for significance. Thus, there is a need for a robust estimator

that can provide more accurate and precise results. In this regard, we propose a

nonparametric approach. A nonparametric covariance estimator was proposed by Huang

et al. (2006) for the null model (Section 1.2.3). These authors used a penalized likelihood

procedure in solving a set of regression equations obtained from the modified Cholesky

decomposition of the covariance matrix. Their estimator is regularized and guaranteed to

28

be positive-definite. In this dissertation, we extend their model to the mixture likelihood

framework of functional mapping.

The rest of the chapters are organized as follows:

In Chapter 2, we describe the modified Cholesky decomposition approach of

covariance estimation and Huang et al.’s nonparametric procedure. We provide some

discussion on ridge regression and LASSO techniques for solving a regression, and

penalized likelihood because these are the main concepts behind their method. Then

we show the extension to the mixture likelihood case and apply our model to simulated

and real longitudinal data.

In Chapter 3, we extend the use of our proposed estimator to functional mapping

of reaction norms with two environmental signals. We consider photosynthetic rate as

the reaction norm and irradiance and temperature as the two environment signals. The

previous proposed covariance model was parametric and separable. In situations when the

underlying data structure is nonseparable, our nonparametric estimator is shown to be

more reliable based on the simulation results.

In Chapter 4, we conclude this dissertation.

29

CHAPTER 2NONPARAMETRIC COVARIANCE ESTIMATION IN FUNCTIONAL

MAPPING OF QTL

2.1 Introduction

Covariance estimation is an important aspect of multivariate methods such as

principal component analysis, discriminant analysis, multivariate regression, and

longitudinal data modeling (Wong et al., 2003; Bickel and Levina, 2008; Levina et al.,

2008; Rothman et al., 2007). The most commonly used covariance estimator is the sample

covariance matrix which is unbiased and positive-definite (Huang et al., 2006 and 2007b).

However, with high-dimensional data like gene arrays, fMRI, or spectroscopy, the sample

covariance matrix is highly unstable. In longitudinal data modeling, covariances that have

definite structures, such as compound symmetry and AR(1), are popular model choices.

Aside from parsimony, these structures guarantee positive-definite estimators. However,

unsuitably chosen or misused parametric models can result in severe estimation bias.

Hence, there is a need for more robust estimators.

The main difficulties associated with covariance estimation are the number of

parameters that need to be estimated (which grows quadratically with dimension) and

the positive-definite constraint. Pourahmadi (1999) discovered that the latter can be

taken care of if one uses the modified Cholesky decomposition. A few published research

that followed Pourahmadi’s suggestion proposed ways of regularization to provide an

efficient covariance estimator (Wu and Pourahmadi, 2003; Huang et al., 2006 and 2007b;

Levina et al., 2008). In this chapter, we adopt the approach proposed by Huang et al.

(2006) which uses a penalized likelihood procedure and extend it to the mixture likelihood

framework of functional mapping. Such extension is possible if the posterior probability

reparameterization of the derivative of the log-likelihood function, 1–11, is used. The

new approach is a nonparametric covariance estimator in functional mapping. The term

nonparametric, which refers to distribution free methods, may not be exactly appropriate.

As we shall see in this chapter, the estimator is really for unstructured covariances and

30

it is nonparametric in this sense. But because nonparametric is the term used in the

literature (Huang et al., 2006 and 2007b), we shall adhere to it.

This chapter is organized as follows: In Section 2.2, we discuss the modified Cholesky

decomposition and its regression interpretation, review the methods proposed in the

literature for regularized covariance estimators, and discuss ridge regression (Hoerl and

Kennard, 1970), the least absolute shrinkage and selector operator or LASSO (Tibshirani,

1996), and penalized likelihood. In Section 2.3, we describe how Huang et al.’s approach

can be extended to functional mapping. Section 2.4 is devoted to simulations and an

analysis of a real data of an intercross progeny of mice using our proposed methodology.

The last section (2.5) summarizes the chapter and provides some discussion.

2.2 Covariance Estimation

2.2.1 Modified Cholesky Decomposition and Regression Interpretation

Whereas a Cholesky decomposition expresses a symmetric positive-definite matrix in

terms of a lower triangular matrix and its transpose, the modified Cholesky decomposition

expresses it in terms of a lower triangular and diagonal matrix. That is, the symmetric

and positive-definite covariance matrix Σ of a zero-mean random longitudinal vector

y = (y1, ..., ym)′, can be uniquely diagonalized as

TΣT ′ = D, (2–1)

where

T =

1 0 0 · · · 0

−φ21 1 0 · · · 0

−φ31 −φ32 1 · · · ...

... · · · · · · . . ....

−φm1 −φm2 · · · −φm,m−1 1

,

31

and

D =

σ21 0 0 · · · 0

0 σ22 0 · · · 0

0 0. . . · · · ...

... · · · · · · . . ....

0 0 · · · 0 σ2m

.

Aside from ensuring positive-definiteness, the modified Cholesky decomposition

provides a meaningful statistical interpretation from its components (Pourahmadi, 1999):

the subdiagonal entries of T are the regressions coefficients when each yt (t = 2, ..., m)

is regressed on its predecessors yt−1, ..., y1 and the entries of D are the corresponding

prediction error variances. More precisely, y1 = ε1 and for t = 2, ..., m,

yt =t−1∑j=1

φtjyj + εt (2–2)

where −φtj is the (t, j)th entry (for j < t) of T , and σ2t =var(εt) is the tth diagonal

element of D. φtj, j = 1, ..., t − 1; t = 2, ..., m and σ2t , t = 1, ..., m are referred

to as generalized autoregressive parameters (GARPs) and innovation variances (IVs),

respectively. Note that Eq. 2–2 can be generalized to the non-zero mean case, E(y) =

E[(y1, ..., ym)′] = (µ1, ..., µm)′ = µ, as

yt − µt =t−1∑j=1

φtj(yj − µj) + εt. (2–3)

Furthermore, Eq. 2–2 can be written in matrix form as

Ty = ε (2–4)

where ε = (ε1, ..., εm)′ and cov(εi, εj) = 0 for all i and j. Thus,

cov(Ty) = T cov(y)T ′ = TΣT ′ = D = cov(ε)

which yields Eq. 2–1.

32

2.2.2 Regularized Covariance Estimators

Ill-conditioned covariance estimators can cause overfitting of data by the model.

Thus, there is a need for regularization to prevent this kind of problem. Several

approaches that provide regularized covariance estimators have been proposed and we

shall explore some of them in this section.

Pourahmadi (1999) recognized that the GARPs and the logarithm of the IVs are

unconstrained parameters and hence can be modeled in terms of covariates. His approach

was to estimate the Cholesky components T and D, instead of estimating Σ directly.

That is, find estimates T and D of T and D, respectively, so that an estimator of Σ is

Σ = T−1D(T−1)′ which is positive-definite. He did this by suggesting a link function g(·)for Σ defined by

g(Σ) = log D + T + T ′ − 2I (2–5)

(Wu and Pourahmadi, 2003) where log D means the matrix D where the logarithm is

taken on each diagonal element and I is the identity matrix. This formulation is analogous

to a link function for the mean in generalized linear model theory (McCullagh and Nelder,

1989).

Although the Cholesky components still have the same number of parameters

to estimate as any unstructured covariance matrix Σ, this number can be reduced

considerably by using covariates and modeling the entries of T and D either parametrically,

nonparametrically, or in a Bayesian way (Pourahmadi, 1999; Daniels and Pourahmadi,

2002; Wu and Pourahmadi, 2003). Pourahmadi (1999) illustrated the parametric approach

by using time lags as covariates for the entries of T and D in analyzing the cattle data

(Kenward 1987). Wu and Pourahmadi (2003) and Huang et al. (2007b) each used a

nonparametric approach by capitalizing on the regression representation Eq. 2–2.

Intuitively, for longitudinal data, terms far away in the regression are expected to be

small. That is, the lag j regression coefficient φt,t−j is expected to be small for a fixed t

and large j. This means that for a given row on the T matrix, the terms are expected

33

to be monotone decreasing as they move farther away from the main diagonal. Thus, a

reasonable estimate for T would be a banded (or tapered) matrix T where the first m0

subdiagonals of T are estimated or smoothed and the remaining subdiagonals are set to

zero. m0 is chosen using the Akaike Information Criterion (AIC; Akaike, 1974). Wu and

Pourahmadi (2003) used local polynomial estimators to estimate the first m0 subdiagonals

of T while Huang et al. (2007b) used B-Splines. The latter authors claim that their

method is more efficient than the former in terms of reduction in risk as shown in their

simulations. Moreover, because their method employs maximum likelihood estimation,

the EM algorithm can be used to handle missing data and simultaneous modeling of

mean and covariance can be accommodated. Neither of these can be done under Wu

and Pourahmadi’s approach. In the Bayesian paradigm, Furrer and Bengtsson (2006)

considered a banded sample covariance matrix instead of T (Bickel and Levina, 2008).

Although Huang et al.’s nonparametric approach of smoothing the first few

subdiagonals of the T matrix and setting the rest to zero produces a statistically efficient

estimator of Σ, it may not be adequate in cases when the diagonals are not smooth or

when there may be small but nonzero elements in T . An alternative approach is to use

penalized likelihood as proposed by Huang et al. (2006). By imposing roughness penalties

on the negative log-likelihood function, this procedure essentially provides a solution to the

sequence of regression equations 2–2. The class of Lp penalties with p = 1, 2 are considered

under the general framework of penalized likelihood for regression models (Fan and Li,

2001). The L1 and L2 penalties allow shrinkage on the GARPs in a similar fashion as

LASSO and ridge regression, respectively. Moreover, the L1 penalty can shrink some of

the GARPs toward zero and thus provide a selection scheme for regression coefficients.

This is however different from banding T where φt,t−j is set to zero for all j > m0. With

the L1 penalty, the zeroes can be irregularly placed in any given row of T . Levina et al.

(2008) proposed a similar penalized likelihood procedure called adaptive banding. Their

method uses a nested lasso penalty which sets φt,t−j to zero for all j > k, where k may

34

be different for each row of T . Adaptive banding is particularly useful in the case when

sparsity in the inverse covariance matrix is desirable such as in graphical models (Yuan

and Lin, 2007). Other approaches include using the L1 penalty directly on the covariance

matrix (Banerjee et al., 2006; Bickel and Levina, 2008), hierarchical priors to allow zeros

in T (Smith and Kohn, 2002; Rothman et al., 2007) or in the inverse covariance matrix

(Wong et al., 2003; Rothman et al., 2007), and Steinian shrinkage toward the identity

(Ledoit and Wolf, 2003; Bickel and Levina, 2008).

We adopt the L2 penalty approach of Huang et al. (2006) and propose an extension

of this method to covariance estimation in the mixture likelihood framework of functional

mapping. Such extension is possible by capitalizing on the posterior probability representation

of the mixture log-likelihood used in the implementation of the EM algorithm (Section

2.3.1). Estimation is then carried out by using the ECM algorithm (Meng and Rubin,

1993) with two CM-steps (Section 2.3.2).

2.2.3 Ridge Regression and LASSO

In this section, we discuss two techniques used in solving regression problems: ridge

regression and LASSO.

Assume the linear regression model, y = Xβ + ε, where y = (y1, y2, ..., yn)′ is the

vector of responses,

X =

x11 x21 · · · xp1

x12 x22 · · · xp2

......

......

x1n x2n · · · xpn

is the design matrix with rank(X) = p, β = (β1, ..., βp)′ is the parameter vector,

ε = (ε1, ε2, ..., εn)′ is the vector of independent and identically distributed (iid) random

errors satisfying E(ε) = 0 and E(ε′ε) = σ2I, and I is the identity matrix. When ε is

normal, the ordinary least squares (OLS) estimate

β = (X ′X)−1X ′y, (2–6)

35

is based on the minimum residual sums of squares

ε′ε = (y −Xβ)′(y −Xβ)′ =n∑

i=1

(yi −

p∑j=1

xijβj

)2

, (2–7)

and gives the minimum variance among unbiased linear estimators of β. A drawback of

the OLS estimator 2–6 is that it is not unique when the design matrix X is less than full

rank, i.e. rank(X) < p.

Assume X ′X is in correlation form and let its eigenvalues be given by

λmax = λ1 ≥ λ2 ≥ ... ≥ λp = λmin > 0. (2–8)

The expected squared distance of β from β can be expressed in terms of eigenvalues as

E[(β − β)′(β − β)] = σ2Trace(X ′X)−1 = σ2

p∑i=1

1

λi

(2–9)

and the variance, when ε is normal, is given by

V [(β − β)′(β − β)] = 2σ4Trace(X ′X)−2 = 2σ4

p∑i=1

(1/λi)2 (2–10)

(Hoerl and Kennard, 1970; see Appendix B for a derivation). Eq. 2–9 can also be

expressed as

E[β′β] = β′β + σ2

p∑i=1

1

λi

. (2–11)

Ideally, X ′X is nearly the identity matrix so that the predictor variables are

orthogonal to each other and are weakly correlated. However, in the presence of mul-

ticollinearity, this will not be the case and X ′X can have at least one small eigenvalue.

Multicollinearity occurs when there are near linear dependencies or high correlations

among the regressors or predictor variables. Mathematically, this means that there exist

constants cj|j = 1, ..., p, not all zero, such that

p∑j=1

cjxj ≈ 0.

36

As can be seen from Eqs. 2–9 and 2–10, small eigenvalues can cause the expected value

and variance of the squared distance from β to β to be large or, as shown by Eq. 2–11,

the regression coefficients themselves to be too large in magnitude. Similarly, the variance

of β can also be inflated since var(β) = σ2(X ′X)−1. As a result, the mean squared error

(MSE) also becomes inflated and predictions based on the OLS estimator (2–6) are not

very reliable.

One way of resolving multicollinearity is through ridge regression. The idea of ridge

regression is to make X ′X close to the identity matrix by replacing it with X ′X + κI

where κ is some positive number and I is the identity matrix. The resulting estimator is

βr = (X ′X + κI)−1X ′y (2–12)

which is essentially a shrunk version of β toward 0. βr yields larger eigenvalues λi + κ, for

i = 1, ..., p, and therefore, smaller prediction variances. It should be noted however, that

there is a trade-off for this because unlike β, βr is not unbiased. But on the average, we

still get lower MSE and better overall prediction. Hoerl and Kennard (1970) provides ways

for selecting κ.

LASSO does a similar approach as ridge regression in reducing variance by sacrificing

bias. LASSO also shrinks the regression coefficients towards 0 but goes further by possibly

allowing some of them to be 0. This is desirable in the sense that the resulting model

is parsimonious and has better interpretation, because only the coefficients with strong

effects are included in the model. The LASSO estimate of β, which we shall denote as βl,

can be obtained by minimizing

n∑i=1

(yi −

p∑j=1

xijβj

)2

subject to the constraint

p∑j=1

|βj| ≤ t (2–13)

where t ≥ 0 is a tuning parameter. This is a quadratic programming problem with linear

inequality constraints and Tibshirani (1996) provides efficient and stable algorithms to

37

solve it. 2–13 is equivalent to the penalized residual sum of squares

n∑i=1

(yi −

p∑j=1

xijβj

)2

+ λ

p∑j=1

|βj|, (2–14)

(Gill et al., 1981; Tibshirani, 1996) where λ is a tuning parameter.

Ridge regression can also be expressed as a constrained optimization problem as a

minimization ofn∑

i=1

(yi −

p∑j=1

xijβj

)2

subject to

p∑j=1

β2j ≤ t (2–15)

where t ≥ 0 is a tuning parameter or equivalently, a minimization of

n∑i=1

(yi −

p∑j=1

xijβj

)2

+ λ

p∑j=1

β2j , (2–16)

(Tibshirani, 1996) where λ is a tuning parameter. Therefore, 2–16 also leads to 2–12 with

κ = λ.

2.2.4 Penalized Likelihood

Let L(θ) be a loss function of some variable θ. An estimation procedure usually

involves minimization of L(θ) to obtain an estimator θ that is said to have a ”good”

fit. However, there may be an aspect of θ, say p(θ), that a modeler wants to control or

consider besides it being a good fit. This can be achieved by minimizing the penalized

likelihood, instead of L(θ), which has the generic form

L(θ) + λp(θ) (2–17)

where p(θ) is called the penalty function and λ > 0 is a tuning parameter. The idea

behind penalized likelihood is, in some sense, similar to mean squared error

mse = bias2 + variance (2–18)

38

where a balance between bias and variance is often desired. In this context, the fit and

p(θ), correspond to bias and variance, respectively. As an example, consider the model

yi = f(xi) + εi, i = 1, ..., n (2–19)

where yi is the response in a regression on xi, εi is i.i.d. N(0, σ2) and f is a function to be

estimated. Here, θ = f and L(f) =∑n

i=1(yi− f)2, the residual sum of squares, without the

constants. A completely unbiased estimate of f is a curve, f , that fits all the yi’s exactly.

However, this curve should have a high variance because of its rapid local variation. We

say that f in this case is ”rough” so that if we want to control the ”roughness” aspect of

f , we can use it as a penalty, p(f), in 2–17. Towards the other extreme, another estimate

of f can be too smooth but may be severely biased. Figure 2-1 shows three different

estimates of f with varying degrees of roughness. The ”in between” curve seems to be

the best estimate because it has a good balance between fit and roughness. The tuning

parameter λ controls the relative importance of these two.

A popular choice for assessing roughness is the integrated squared second derivative

(ISSD)

pISSD(f) =

∫[f ′′(x)]2dx (2–20)

which measures the curvature of a function (Ramsay and Silverman, 1997; Green, 1999).

Highly variable functions will have high values for pISSD. A linear function has pISSD = 0

and therefore has the least curvature.

Another example of penalized likelihood is ridge regression which we have seen

in Section 2.2.3. In ridge regression, 2–16 has the same form as 2–17 with θ = β =

(β1, ..., βp)′. The aspect of β that is being controlled is the size of its elements (the

regression coefficients), quantified by pr(β) = β′β =∑p

j=1 β2j . LASSO also controls the

size of the regression coefficients but by using the penalty pl(β) =∑p

j=1 |βj| instead. Ridge

regression and LASSO are special cases of a family of penalized regressions called bridge

regression which imposes a penalty function of the form∑p

j=1 |βj|γ, γ ≥ 1 (Fu, 1998).

39

−4 −3 −2 −1 0 1 2 3 4

−3

−2

−1

0

1

2

3

x

y

smooth

in between

rough

Figure 2-1. Penalized likelihood in curve estimation. Depending on the penalty, theestimated curve can either be rough, smooth or in between. The in betweencurve illustrates a balance between bias and variance.

Penalized likelihood is also used in model selection. Suppose we have model Mi

with parameter vector θi and likelihood function Li(θi). Let Λ = L1(θ1)/L2(θ2) where

θi = argminLi(θi), i = 1, 2. Suppose model M2 is nested within model M1. Then by

the likelihood ratio test, the simpler model M2 is rejected if −2 log Λ exceeds a certain

percentile of the χ2p1−p2

distribution, where pi is the number of parameters in Mi. However,

if the amount of data is large, the more complex model M1 is selected even when the

simpler model M2 is true (Lindley, 1957; Green, 1999). Various approaches that utilize

penalized likelihood have been developed to alleviate this problem. One such approach is

the Bayesian Information Criteria (BIC; Schwarz, 1978) which, for model Mi, is defined as

BICi = −2 log Li(θi) + pi log n (2–21)

40

where −2 log Li(θi) is the loss function, pi log n is the penalty term, and n is the sample

size. The model with lowest BIC is selected as the best model. Models M1 and M2 can be

compared by

BIC1 −BIC2 = −2 log Λ + (p1 − p2) log n (2–22)

which shows a penalized form of the likelihood ratio test statistic. Other approaches such

as Akaike’s Information Criteria (AIC; Akaike, 1974)

− 2 log Λ + 2(p1 − p2) (2–23)

and Mallows’ Cp (Mallows, 1973)

− 2 log Λ + (p1 − p2) (2–24)

use different forms of the penalties. The main idea behind these criteria is penalize more

complex models to favor the simpler ones based on the number of parameters in each of

them.

Recall that LASSO shrinks the regression coefficients towards zero and even allow

them to be exactly zero. Therefore, in addition to controlling the size of the regression

coefficients through pl(β) =∑p

j=1 |βj|, LASSO also implements model selection by

providing simpler or more parsimonious regression models.

For more about penalized likelihood see Ramsay and Silverman (1997) and Green

(1999), and for asymptotic theory, Cox et al. (1990).

2.3 Covariance Estimation in Functional Mapping

2.3.1 Computing the Penalized Likelihood Estimates

Let the kth genotype density be written as

fk(yi) = (2π)−m/2|Σ|−1/2 exp−(yi − gk)T Σ−1(yi − gk)/2

= (2π)−m/2|Σ|−1/2 exp−(yki )

T Σ−1(yki )/2 (2–25)

where yki = yi − gk, k = 1, ..., J .

41

Recall that the log-likelihood function is

log L(Ω) =n∑

i=1

log

[J∑

k=1

pk|ifk(yi|Ω)

](2–26)

and that

∂

∂θlog L(Ω) =

n∑i=1

J∑

k=1

Pk|i∂

∂θlog fk(yi|Ω) (2–27)

where

Pk|i =pk|ifk(yi|Ω)∑J

h=1 ph|ifh(yi|Ω)(2–28)

and θ ∈ Ω = (Ωµ, ΩΣ).

It follows from TΣT ′ = D, Eq. 2–1, that Σ−1 = T ′DT and |Σ| = |D|. Therefore, if Ωµ

is given,

∂

∂θlog L(ΩΣ) =

n∑i=1

J∑

k=1

Pk|i∂

∂θ

(−m

2log 2π − 1

2log |Σ| − 1

2yk

i

′Σ−1yk

i

)

= −1

2

n∑i=1

J∑

k=1

Pk|i∂

∂θ

(log |Σ|+ yk

i

′Σ−1yk

i

)

= −1

2

n∑i=1

J∑

k=1

Pk|i∂

∂θ

(m∑

t=1

log σ2t +

m∑t=1

εkit

2

σ2t

)(2–29)

where εki1 = yk

i1 and εkit = yk

it −∑t−1

j=1 φtjykij for t = 2, ..., m. It is implicitly assumed,

therefore, that σ2t =var(εk

t ) for k = 1, ..., J . Note that if εk = (εk1, ..., ε

km)′ and yk =

(yk1 , ..., y

km)′ then εk = Tyk so that var(εk) = TΣT ′=D.

Define the penalized negative log-likelihood as

− 2 log L(ΩΣ) + λp(φtj). (2–30)

Assuming the L2 penalty p(φtj) =∑m

t=2

∑t−1j=1 φ2

tj, we have

− 2 log L(ΩΣ) + λp(φtj) = −2n∑

i=1

log

[J∑

k=1

pk|ifk(yi|ΩΣ)

]+

m∑t=2

t−1∑j=1

φ2tj (2–31)

Our problem is immediately solved if 2–31 can be expressed in the same form as 2–16

where the φtj’s correspond to the βj’s. However, the first term on the right side of 2–31

42

cannot be explicitly written in terms of the φtj’s, unless it is in derivative form 2–27.

Thus, by taking the derivative of 2–31 and using 2–29, we get

∂

∂θ[−2 log L(ΩΣ) + λp(φtj)] = −2

n∑i=1

J∑

k=1

Pk|i∂

∂θlog fk(yi|ΩΣ) + λ

∂

∂θ

m∑t=2

t−1∑j=1

φ2tj

=n∑

i=1

J∑

k=1

Pk|i∂

∂θ

(m∑

t=1

log σ2t +

m∑t=1

εkit

2

σ2t

)+ λ

∂

∂θ

m∑t=2

t−1∑j=1

φ2tj

=∂

∂θ

[n∑

i=1

J∑

k=1

Pk|i

(m∑

t=1

log σ2t +

m∑t=1

εkit

2

σ2t

)+ λ

m∑t=2

t−1∑j=1

φ2tj

]

=∂

∂θ

[n∑

i=1

J∑

k=1

Pk|i

(log σ2

1 +εki1

2

σ21

)]+

∂

∂θ

m∑t=2

[n∑

i=1

J∑

k=1

Pk|i

(m∑

t=1

log σ2t +

m∑t=1

εkit

2

σ2t

)+ λ

m∑t=2

t−1∑j=1

φ2tj

].

Notice that, without the derivative, the third line in the preceding calculation has the

same form as 2–16 when written in terms of φtj because εkit = yk

it −∑t−1

j=1 φtjykij for

t = 2, ..., m. Thus, we need to minimize

n∑i=1

J∑

k=1

Pk|i

(log σ2

1 +εki1

2

σ21

)(2–32)

andn∑

i=1

J∑

k=1

Pk|i

(log σ2

t +m∑

t=1

εkit

2

σ2t

)+ λ

t−1∑j=1

φ2tj (2–33)

for each t = 2, ..., m.

The minimizer of 2–32 can be obtained by solving

∂

∂σ21

[n∑

i=1

J∑

k=1

Pk|i

(log σ2

1 +εki1

2

σ21

)]=

n∑i=1

J∑

k=1

Pk|i

(1

σ21

− yki1

2

σ41

)= 0

which yields

σ12 =

∑ni=1

∑Jk=1 Pk|iyk

i12

n. (2–34)

For t = 2, ..., m, 2–33 can be minimized by alternating minimization over σ2t and φtj,

j = 1, 2, ..., t− 1 (see Appendix C). The solutions are

43

σ2t =

∑ni=1

∑Jk=1 Pk|i

(yk

it −∑t−1

j=1 ykijφtj

)2

n(2–35)

and

φt(t) = (Ht + λIt)−1gt (2–36)

where φt(t) = (φt1, φt2, ..., φt,t−1)′ and Ht, It and gt are given in Appendix C. Notice

the similarity of 2–36 to 2–12 and that in formulas 2–34, 2–35 and 2–36, the posterior

probabilities, Pk|i’s, are the weights for the genotype groups, k = 1, ..., J . A nonparametric

covariance estimate, ΣNP can therefore be obtained through ΣNP = T−1D(T−1)′, where

the elements of D are given by 2–34 and 2–35, and the elements of T are given by 2–36.

The preceding calculations were based on the L2 penalty, p(φtj) =∑m

t=2

∑t−1j=1 φ2

tj. If

the L1 penalty, p(φtj) =∑m

t=2

∑t−1j=1 |φtj|, is used instead, closed form solutions like 2–35

and 2–36 cannot be obtained and an iterative algorithm is needed. This is carried out by

using an iterative local quadratic approximation of∑t−1

j=1 |φtj| (Fan and Li, 2001; Ojelund

et al., 2001). The reader is referred to Huang et al.(2006) for additional details.

2.3.2 From EM to ECM Algorithm

The EM algorithm can also be used for penalized likelihood estimation (Green,

1990). In the E-step of the EM algorithm (Section 1.2.2), a penalty term, p(Ω), on the

model parameters, can be introduced to the complete data log-likelihood, A–1, to get the

penalized complete data log-likelihood

log LPc (Ω) =

n∑i=1

J∑

k=1

xik[log pk|i + log fk(yi|Ω)] + λp(Ω). (2–37)

Clearly, taking the conditional expectation of 2–37 does not affect the penalty term

because the expectation is taken with respect to the missing variable x. Thus, at the

E-step, we have

QP (Ω|Ω(j)) =n∑

i=1

J∑

k=1

P(j)k|i [log pk|i + log fk(yi|Ω)] + λp(Ω). (2–38)

44

The M-step, therefore, involves solving

∂

∂θQP (Ω|Ω(j)) =

n∑i=1

J∑

k=1

P(j)k|i

∂

∂θlog fk(yi|Ω) + λ

∂

∂θp(Ω) = 0 (2–39)

to get Ω(j+1), where θ ∈ Ω.

The derived formulas for ΩΣ in the preceding section cannot be directly used in the

EM algorithm because we assumed that Ωµ was given. We instead use a variant of the EM

algorithm called the Expectation and Conditional Maximization (ECM) algorithm (Meng

& Rubin, 1993) which partitions the parameter set Ω according to mean and covariance

parameters, Ωµ and ΩΣ, respectively. The ECM algorithm differs from EM in that the

M-step involves a conditional optimization with respect to each partition of Ω. More

precisely, for the (j + 1)th iteration, the ECM algorithm proceeds as follows:

1. Initialize Ω(j) = (Ω(j)µ ,Ω

(j)Σ ).

2. E-Step. Update P(j) using Ω(j).

3. CM-Steps.

• Conditional on P(j) and Ωµ(j), solve for ΩΣ

(j+1) using 2–34, 2–35 and 2–36

(Section 2.3.1).

• Conditional on P(j) and ΩΣ(j+1), solve for Ωµ

(j+1).

4. Repeat steps (2)− (3) until some convergence criterion is met.

Unless a structure, such as AR(1), is imposed on the covariance matrix, it is difficult

to find closed form CM-step solutions for the mean parameters in functional mapping.

Hence, estimation in this case is carried out by using the Nelder-Mead simplex algorithm

(Nelder and Mead, 1965; Zhao et al., 2004) which can be readily implemented by popular

software. See for example the fminsearch built-in function in Matlab or optim in R.

In a backcross population design, Ma et al. (2002) provide closed form iteration

formulas for

Ω(j+1) = a(j+1)k , b

(j+1)k , r

(j+1)k , σ2(j+1)

, ρ(j+1)|k = 1, 2 (2–40)

45

when the mean model is a logistic curve and the covariance structure is AR(1). Similar

formulas can also be obtained when the mean model is in the form of a rational function

and the covariance structure is AR(1) (Yap et al., 2007).

2.3.3 Selection of Tuning Parameter

The tuning parameter λ can be selected either through 5 or 10-fold cross-validation or

generalized cross-validation (Huang et al., 2006; Fan and Li, 2001).

For a K-fold cross-validation, let Z denote the full data set. Z is randomly split into

K subsets of about the same size. Each subset, say Zs (s = 1, ..., K), is used to validate

the log-likelihood based on the parameters estimated using the data Z \ Zs. The value

of λ that maximizes the average of all cross-validated log-likelihoods is used to select an

estimate for Σ.

The cross-validated log-likelihood criterion is given by

C(λ) =1

K

K∑s=1

log Ls(Ω−s) (2–41)

where Ω−s is an estimate of Ω−s which is based on the data set Z \ Zs and Ls is the

likelihood based on Zs. λ = λ is chosen to maximize C(λ).

Note that there really are two sets of tuning parameters in our setting - one under

the null model and another under the alternative. However, because the log-likelihood

under the null model is constant throughout a marker interval, we shall assume that the

corresponding tuning parameter has been estimated accordingly and in the succeeding

sections simply refer to the tuning parameters as the ones for the alternative model.

2.4 Numerical Results

2.4.1 Simulations

In this section, the performance of the nonparametric covariance estimator, ΣNP

(Section 2.3.1), is assessed and compared to an AR(1)-structured estimator, ΣAR(1)

(Section 1.2.1). We investigate data generated from both multivariate normal and

t-distributions. We begin with the former.

46

Assuming an F2 population (J = 3 genotype groups; Section 1.1.2) for QTL mapping,

we randomly generated 6 markers equally spaced on a chromosome 100 cM long with

1 QTL between the second and third markers, 12 cM from the second marker (or 32

cM from the leftmost marker in the chromosome). Each phenotype associated with

the simulated QTL had m = 10 measurements and was sampled from a multivariate

normal distribution, using logistic curves as genotype means under 3 different covariance

structures. The mean parameters were a1 = 30, a2 = 28.5, a3 = 27.5, b1 = b2 = b3 = 5, and

r1 = r2 = r3 = .5 and the covariance structures were as follows:

• Σ1 = AR(1) with σ2 = 3, ρ = 0.6;

• Σ2 = σ2(1 − ρ)I + ρJ), with σ2 = 3, ρ = 0.5, J is a matrix of 1’s, and I is the

identity matrix (Compound Symmetry);

• an unstructured covariance matrix

Σ3 =

0.72 0.39 0.45 0.48 0.50 0.53 0.60 0.64 0.68 0.68

0.39 1.06 1.61 1.60 1.50 1.48 1.55 1.47 1.35 1.29

0.45 1.61 3.29 3.29 3.17 3.09 3.19 3.04 2.78 2.53

0.48 1.60 3.29 3.98 4.07 4.01 4.17 4.18 4.00 3.69

0.50 1.50 3.17 4.07 4.70 4.68 4.66 4.78 4.70 4.36

0.53 1.48 3.09 4.07 4.68 5.56 6.23 6.87 7.11 6.92

0.60 1.55 3.19 4.17 4.66 6.23 8.59 10.16 10.80 10.70

0.64 1.47 3.04 4.18 4.78 6.87 10.16 12.74 13.80 13.80

0.68 1.35 2.78 4.00 4.70 7.11 10.80 13.80 15.33 15.35

0.68 1.29 2.53 3.69 4.36 6.92 10.70 13.80 15.35 15.77

.

Σ1 and Σ2 were considered previously by Huang et al. (2006) and Σ1 by Levina et al.

(2008). Σ3 has increasing diagonal elements and decreasing long term dependence which is

typical of longitudinal growth data. It is based on the sample covariance matrix of a real

data set.

47

Functional mapping was applied to the simulated data, with n = 100 and 400

samples, using a logistic model for the mean, and ΣNP and ΣAR(1) for the covariance

matrix. The simulated chromosomes were searched at every 4 cM (i.e. 0, 4, 8, ..., 100) for

a total of 26 search points across 5 marker intervals. The estimated model parameters

at each point were used to construct an LR plot for the QTL linkage map. For ΣNP ,

the LR plot is constructed from parameter estimates obtained out of individual tuning

parameters λc (c = 1, ..., 26), that are separately cross-validated. However, we focused our

attention only on those λ’s corresponding to the maximum LR at each marker interval.

An initial LR plot was constructed using an arbitrary λ0 (λc = λ0 for all c = 1, ..., 26), and

the maximum on each marker interval was located. At the point corresponding to each

maximum, λ = λd (d = 1, ..., 5) was selected using 5-fold cross-validation. The final model

parameter estimates were based on the λd that produced the maximum LR or maxLR.

In Figure 2-2, the broken line LR plot is the result of our procedure while the solid one

is based on individual λc’s that have each been separately cross-validated. For n = 400,

these two plots are indistinguishable. The reason for this is that, the cross-validated λ’s at

each search point within a marker interval are not that different from one another. Thus,

using one λ for each marker interval (the one that produces the maximum LR) will not

significantly alter the general shape of the LR plot. The two dotted line plots were based

on λc, for all c = 1, 2, ..., 26, set to two different arbitrary values of λ.

To evaluate the estimate Σl (l = 1, 2, 3) of the true covariance structure Σl, a number

of criteria can be used. Among them are the matrix norm losses

L(Σl, Σl) =‖Σl − Σl‖‖Σl‖

where ‖ · ‖ denotes either the operator (‖Σ‖2op = max(ΣΣ′)), Frobenius (‖Σ‖2

F =∑m

i=1

∑nj=1 |σij|2), or matrix 1-norm (‖Σ‖1 = maxj

∑i |σij|), the entropy loss

LE(Σl, Σl) = tr(Σ−1l Σl)− log |Σ−1

l Σl| −m (2–42)

48

0 20 40 60 80 1000

10

20

30

LR

n=100

0 20 40 60 80 100−50

0

50

100

150

LR

n=400

0 20 40 60 80 1000

20

40

60

80

LR

0 20 40 60 80 1000

50

100

150LR

0 20 40 60 80 1000

5

10

15

20

25

LR

0 20 40 60 80 100−10

0

10

20

30

40

LR

Σ1 Σ

1

Σ2 Σ

2

Σ3 Σ

3 individual λ ’s, CV

max λ ’s, CVarbitrary λ ’s

maxLR

maxLR

maxLR

maxLR

maxLR

maxLR

Figure 2-2. Log-likelihood ratio (LR) plots based on simulated data under three differentcovariance structures. The solid line plot is based on cross-validated (CV)tuning parameters at each search point (individual λ’s). The broken line plotis based on cross-validated tuning parameters (max λ’s) corresponding to themaximum LR in each marker interval. The dotted line plot is based on twodifferent arbitrary tuning parameter values, each assumed at all search points.

49

and the quadratic loss

LQ(Σl, Σl) = tr(Σ−1l Σl − I)2 (2–43)

where I is the identity matrix. These losses are all nonnegative and equality to zero holds

when Σl = Σl. There is no agreement as to which of these norms is appropriate for a

particular situation but any of them may be used and the results would qualitatively

be the same (Levina et al., 2008). Here, we use LE and LQ which were also used by Wu

and Pourahmadi (2003), Huang et al. (2006 and 2007b), and Levina et al. (2008). The

corresponding risk functions are defined by

rE(Σl, Σl) = E[LE(Σl, Σl)]

and

rQ(Σl, Σl) = E[LQ(Σl, Σl)].

An estimator Σi is deemed better than Σj, for some i 6= j, if its risk function is smaller

i.e. rE(Σ, Σi) < rE(Σ, Σj) or rQ(Σ, Σi) < rQ(Σ, Σj). The risk functions are estimated via

Monte Carlo simulation.

100 simulation runs were carried out and the averages on all runs of the estimated

QTL location, logistic mean parameter estimates, maxLR, entropy and quadratic losses,

including the respective Monte carlo standard errors (SE), were recorded. The results are

shown in Tables 2-1 and 2-2. For Σ1, ΣAR(1) does well as expected, but ΣNP also does a

good job. Both provide better precision with increased sample size. The maxLR values

are comparable i.e. 38.52 and 112.03 from Table 1 versus 37.78 and 128.21 from Table 2,

respectively, are not too different from each other.

For Σ2 and Σ3, ΣNP does better than ΣAR(1). ΣAR(1) shows high values for both

averaged losses which translates to significantly biased estimates in QTL location and

poor mean parameter estimates, particularly for Σ3 at the second and third genotype

group. Increased sample size does not help and even makes mean parameter estimates

worse in the case of Σ3. Values of maxLR for ΣNP and ΣAR(1) are very different in these

50

cases. But because the averaged losses for ΣNP are much smaller, we can conclude that

the corresponding maxLR values must be close to the true ones.

To assess the robustness of our proposed nonparametric estimator, we modeled

simulated data from a t-distribution with 5 degrees of freedom. That is, samples were

taken from

Y = gk +X√Z/v

(2–44)

where X ∼ N(0, Σ), Z ∼ χ2(ν) and gk is the logistic mean for genotype k = 1, 2, 3. The

results are presented in Tables 2-3 and 2-4. We excluded the column for maxLR because

it is not appropriate in this scenario. The results show that despite inflated average losses,

ΣNP still outperforms ΣAR(1). Notice that the quadratic loss is severely inflated because of

the fat tails of the t-distribution. It may not be a reliable measure of performance but we

present the results here for illustration.

51

Tab

le2-

1.A

vera

ged

QT

Lpo

siti

on,m

ean

curv

epa

ram

eter

s,m

axim

umlo

g-lik

elih

ood

rati

os(m

axL

R),

entr

opy

and

quad

rati

clo

sses

and

thei

rst

anda

rder

rors

(giv

enin

pare

nthe

ses)

for

thre

eQ

TL

geno

type

sin

anF

2po

pula

tion

unde

rdi

ffere

ntsa

mpl

esi

zes

(n)

base

don

100

sim

ulat

ion

repl

icat

es(Σ

NP,N

orm

alD

ata)

.Q

TL

QT

Lge

noty

pe1

QT

Lge

noty

pe2

QT

Lge

noty

pe3

Cov

aria

nce

nLoc

atio

na

1b 1

r 1a

2b 2

r 2a

3b 3

r 3m

axL

RL

EL

Q

Σ1

100

32.8

430

.11

5.04

0.50

28.5

24.

970.

5027

.47

5.06

0.50

38.5

20.

531.

00(0

.99)

(0.0

7)(0

.04)

(0.0

0)(0

.05)

(0.0

3)(0

.00)

(0.0

7)(0

.04)

(0.0

0)(1

.27)

(0.0

1)(0

.02)

400

31.5

230

.00

4.99

0.50

28.4

95.

010.

5027

.52

4.97

0.50

112.

030.

140.

28(0

.28)

(0.0

3)(0

.01)

(0.0

0)(0

.02)

(0.0

1)(0

.00)

(0.0

3)(0

.02)

(0.0

0)(1

.80)

(0.0

0)(0

.01)

Σ2

100

32.5

630

.07

4.98

0.50

28.5

54.

990.

5027

.38

5.07

0.51

47.0

50.

440.

83(0

.76)

(0.0

6)(0

.03)

(0.0

0)(0

.04)

(0.0

2)(0

.00)

(0.0

6)(0

.04)

(0.0

0)(1

.32)

(0.0

1)(0

.02)

400

31.6

830

.04

4.97

0.50

28.4

85.

010.

5027

.54

4.98

0.50

145.

830.

130.

26(0

.26)

(0.0

2)(0

.01)

(0.0

0)(0

.02)

(0.0

1)(0

.00)

(0.0

2)(0

.01)

(0.0

0)(2

.00)

(0.0

0)(0

.01)

Σ3

100

33.2

430

.07

5.04

0.50

28.5

95.

010.

5027

.66

5.01

0.50

19.5

70.

561.

09(2

.22)

(0.1

0)(0

.03)

(0.0

0)(0

.06)

(0.0

2)(0

.00)

(0.0

9)(0

.02)

(0.0

0)(0

.59)

(0.0

1)(0

.02)

400

32.3

229

.99

5.00

0.50

28.5

05.

000.

5027

.62

5.01

0.50

38.9

00.

140.

29(1

.19)

(0.0

4)(0

.01)

(0.0

0)(0

.03)

(0.0

1)(0

.00)

(0.0

5)(0

.01)

(0.0

0)(1

.06)

(0.0

0)(0

.01)

Tru

eva

lues

:32

305

0.5

28.5

50.

527

.55

0.5

52

Tab

le2-

2.A

vera

ged

QT

Lpo

siti

on,m

ean

curv

epa

ram

eter

s,m

axim

umlo

g-lik

elih

ood

rati

os(m

axL

R),

entr

opy

and

quad

rati

clo

sses

and

thei

rst

anda

rder

rors

(giv

enin

pare

nthe

ses)

for

thre

eQ

TL

geno

type

sin

anF

2po

pula

tion

unde

rdi

ffere

ntsa

mpl

esi

zes

(n)

base

don

100

sim

ulat

ion

repl

icat

es(Σ

AR

(1),

Nor

mal

Dat

a).

QT

LQ

TL

geno

type

1Q

TL

geno

type

2Q

TL

geno

type

3C

ovar

ianc

en

Loc

atio

na

1b 1

r 1a

2b 2

r 2a

3b 3

r 3m

axL

RL

EL

Q

Σ1

100

33.2

429

.99

5.03

0.50

28.4

84.

990.

5027

.57

5.04

0.50

37.7

80.

020.

04(0

.77)

(0.0

6)(0

.04)

(0.0

0)(0

.05)

(0.0

3)(0

.00)

(0.0

7)(0

.05)

(0.0

0)(1

.09)

(0.0

0)(0

.00)

400

31.8

030

.01

4.97

0.50

28.5

05.

020.

5027

.51

4.98

0.50

128.

210.

010.

01(0

.32)

(0.0

3)(0

.02)

(0.0

0)(0

.02)

(0.0

1)(0

.00)

(0.0

3)(0

.02)

(0.0

0)(1

.98)

(0.0

0)(0

.00)

Σ2

100

35.2

830

.36

4.63

0.48

28.5

45.

040.

5027

.12

5.51

0.52

64.6

82.

156.

57(1

.57)

(0.0

9)(0

.05)

(0.0

0)(0

.07)

(0.0

4)(0

.00)

(0.0

9)(0

.07)

(0.0

0)(2

.53)

(0.0

6)(0

.38)

400

31.9

630

.51

4.62

0.48

28.4

25.

080.

5027

.14

5.35

0.51

193.

842.

669.

94(0

.54)

(0.0

4)(0

.02)

(0.0

0)(0

.03)

(0.0

2)(0

.00)

(0.0

4)(0

.03)

(0.0

0)(4

.65)

(0.0

4)(0

.25)

Σ3

100

46.4

830

.39

5.33

0.51

28.0

14.

990.

5227

.85

5.20

0.51

112.

669.

6473

.15

(2.7

4)(0

.38)

(0.0

9)(0

.00)

(0.3

5)(0

.07)

(0.0

0)(0

.39)

(0.0

9)(0

.00)

(2.8

3)(0

.13)

(2.0

6)40

043

.64

30.6

05.

280.

5127

.64

4.93

0.52

28.3

85.

340.

5028

8.87

10.1

480

.36

(2.6

4)(0

.30)

(0.0

6)(0

.00)

(0.3

4)(0

.07)

(0.0

0)(0

.33)

(0.0

8)(0

.00)

(6.0

9)(0

.07)

(1.1

2)Tru

eva

lues

:32

305

0.5

28.5

50.

527

.55

0.5

53

Tab

le2-

3.A

vera

ged

QT

Lpo

siti

on,m

ean

curv

epa

ram

eter

s,m

axim

umlo

g-lik

elih

ood

rati

os(m

axL

R),

entr

opy

and

quad

rati

clo

sses

and

thei

rst

anda

rder

rors

(giv

enin

pare

nthe

ses)

for

thre

eQ

TL

geno

type

sin

anF

2po

pula

tion

unde

rdi

ffere

ntsa

mpl

esi

zes

(n)

base

don

100

sim

ulat

ion

repl

icat

es(Σ

NP,D

ata

from

t-di

stri

buti

on).

QT

LQ

TL

geno

type

1Q

TL

geno

type

2Q

TL

geno

type

3C

ovar

ianc

en

Loc

atio

na

1b 1

r 1a

2b 2

r 2a

3b 3

r 3L

EL

Q

Σ1

100

32.5

230

.07

5.02

0.50

28.5

85.

020.

5027

.53

5.07

0.50

2.56

10.5

1(1

.34)

(0.0

8)(0

.04)

(0.0

0)(0

.06)

(0.0

4)(0

.00)

(0.0

9)(0

.06)

(0.0

0)(0

.12)

(0.7

5)40

032

.88

30.0

35.

010.

5028

.46

5.00

0.50

27.5

94.

990.

501.

846.

24(0

.49)

(0.0

4)(0

.02)

(0.0

0)(0

.03)

(0.0

2)(0

.00)

(0.0

3)(0

.02)

(0.0

0)(0

.06)

(0.2

5)

Σ2

100

32.5

630

.15

4.94

0.50

28.5

45.

020.

5027

.47

5.09

0.50

2.27

8.81

(1.0

8)(0

.07)

(0.0

3)(0

.00)

(0.0

5)(0

.03)

(0.0

0)(0

.08)

(0.0

4)(0

.00)

(0.1

1)(0

.66)

400

32.8

430

.06

4.97

0.50

28.4

85.

010.

5027

.53

5.02

0.50

1.78

5.86

(0.0

3)(0

.03)

(0.0

2)(0

.00)

(0.0

3)(0

.01)

(0.0

0)(0

.03)

(0.0

2)(0

.00)

(0.0

5)(0

.22)

Σ3

100

40.9

229

.95

5.03

0.50

28.6

35.

000.

5027

.78

5.05

0.50

2.68

11.8

2(2

.76)

(0.1

3)(0

.03)

(0.0

0)(0

.09)

(0.0

2)(0

.00)

(0.1

4)(0

.04)

(0.0

0)(0

.14)

(1.2

6)40

033

.08

29.9

55.

000.

5028

.56

5.02

0.50

27.5

14.

990.

501.

906.

55(1

.37)

(0.0

6)(0

.01)

(0.0

0)(0

.04)

(0.0

1)(0

.00)

(0.0

6)(0

.02)

(0.0

0)(0

.06)

(0.2

6)Tru

eva

lues

:32

305

0.5

28.5

50.

527

.55

0.5

54

Tab

le2-

4.A

vera

ged

QT

Lpo

siti

on,m

ean

curv

epa

ram

eter

s,m

axim

umlo

g-lik

elih

ood

rati

os(m

axL

R),

entr

opy

and

quad

rati

clo

sses

and

thei

rst

anda

rder

rors

(giv

enin

pare

nthe

ses)

for

thre

eQ

TL

geno

type

sin

anF

2po

pula

tion

unde

rdi

ffere

ntsa

mpl

esi

zes

(n)

base

don

100

sim

ulat

ion

repl

icat

es(Σ

AR

(1),

Dat

afr

omt-

dist

ribu

tion

).Q

TL

QT

Lge

noty

pe1

QT

Lge

noty

pe2

QT

Lge

noty

pe3

Cov

aria

nce

nLoc

atio

na

1b 1

r 1a

2b 2

r 2a

3b 3

r 3L

EL

Q

Σ1

100

34.0

030

.04

5.01

0.50

28.6

15.

000.

5027

.51

5.06

0.51

1.65

5.03

(1.1

2)(0

.08)

(0.0

4)(0

.00)

(0.0

6)(0

.03)

(0.0

0)(0

.08)

(0.0

6)(0

.00)

(0.1

0)(0

.39)

400

33.0

429

.98

4.99

0.50

28.4

85.

010.

5027

.61

4.98

0.50

1.61

4.75

(0.4

0)(0

.03)

(0.0

2)(0

.00)

(0.0

3)(0

.02)

(0.0

0)(0

.03)

(0.0

2)(0

.00)

(0.0

7)(0

.28)

Σ2

100

38.9

230

.57

4.62

0.48

28.4

85.

090.

5027

.13

5.58

0.52

6.24

35.2

5(1

.91)

(0.1

3)(0

.06)

(0.0

0)(0

.09)

(0.0

5)(0

.00)

(0.1

4)(0

.09)

(0.0

0)(0

.25)

(2.5

0)40

032

.16

30.6

14.

550.

4828

.35

5.13

0.51

27.2

25.

300.

517.

3545

.86

(0.4

8)(0

.05)

(0.0

2)(0

.00)

(0.0

4)(0

.02)

(0.0

0)(0

.05)

(0.0

3)(0

.00)

(0.1

7)(1

.75)

Σ3

100

49.1

229

.71

5.23

0.58

28.8

05.

210.

5127

.04

5.37

0.53

22.0

430

1.53

(2.9

6)(0

.50)

(0.1

1)(0

.06)

(0.3

8)(0

.08)

(0.0

0)(0

.49)

(0.1

9)(0

.01)

(0.5

6)(1

4.94

)40

042

.64

30.7

85.

380.

5128

.21

5.08

0.52

27.1

25.

050.

5224

.45

366.

54(2

.39)

(0.3

8)(0

.09)

(0.0

0)(0

.35)

(0.0

8)(0

.00)

(0.3

6)(0

.08)

(0.0

0)(0

.49)

(15.

65)

Tru

eva

lues

:32

305

0.5

28.5

50.

527

.55

0.5

55

2.4.2 Real Data Analysis

We study a real mice data set from an experiment by Vaughn et al. (1999). Briefly,

the data consists of an F2 (J = 3 genotype groups; Section 1.1.2) population of 259 male

and 243 female progeny with 96 markers in a total of 19 chromosomes. The mice were

measured for their body mass at 10 weekly intervals starting at age 7 days. Corrections

were made for the effects due to dam, litter size at birth, parity, and sex (Cheverud et al.,

1996; Kramer et al., 1998). A plot of the weight data is shown in Figure 1-3.

Functional mapping was first used to analyze this data in Zhao et al. (2004), who

investigated QTL × sex interaction. The authors used a logistic curve (Eq. 1–4) to model

the genotype means and employed the transform-both-sides (TBS; Section 1.2.1) technique

for variance stabilization in order to utilize an AR(1) structure. Their method identified 4

of 19 chromosomes that each had significant QTLs and they concluded that there were sex

differences of body mass growth in mice. Zhao et al. (2005) applied an SAD covariance

structure in functional mapping and found 3 QTLs. Liu and Wu (2007) likewise analyzed

the same data using a Bayesian approach in functional mapping and detected only 3

significant QTLs.

Here, we applied our proposed nonparametric estimator, ΣNP , in a genome-wide scan

for growth QTL without regard to sex. We scanned the linkage map at intervals of 4 cM.

Figure 2-3 shows the LR plots for all 19 chromosomes. They were obtained using λ’s that

were cross-validated at each search point. We conducted a permutation test (Doerge and

Churchill, 1996; Section 1.2.3) to identify significant QTLs. For every permutation run,

we calculated maxLRe for chromosome e = 1, ..., 19 using the same general procedure

as in the simulations (Section 2.4.1). In this mice data set, however, some markers were

either missing or not genotyped and we used only the available markers (Table 2-5). Thus,

every marker interval had different sets of available phenotype data. But we believe this

did not affect the results because of the large sample size of the available data. We looked

at chromosomes 6 and 7 and found this to be the case. Figure 2-4 shows LR plots based

56

on tuning parameters cross-validated at each search point (solid line) and using the same

tuning parameter for each search point as the one corresponding to the maximum LR in

each marker interval (broken line; our procedure). The dotted line plots were again based

on arbitrary tuning parameters and presented here to illustrate shape consistency. Each

permutation run yielded the maximum maxLRe, for all e = 1, ..., 19, or the genome-wide

maxLR. The two horizontal lines in Figure 2-3 correspond to 95% (broken) and 99%

(solid) thresholds based on 100 permutation test runs. There were 9 chromosomes with

significant QTLs (1, 4, 6, 7, 9, 10, 11, 14 and 15) based on the 95% threshold but only 7

under 99% (1, 4, 6, 7, 10, 11 and 15). The two chromosomes that did not make the 99%

threshold (9 and 14) barely made the 95%. For this mice data set, we recommend using

the 99% threshold because there were only 100 permutation test runs. Zhao et al. (2004)

identified QTLs in chromosomes 6, 7, 11 and 15, and Zhao et al. (2005) and Liu and Wu

(2007) found QTLs in chromosomes 6, 7 and 10. These were all at the 95% threshold.

Our findings verified the results of these previous studies that made use of the functional

mapping method and even detected more QTLs. Although there is a discrepancy in our

results and others, it is inconclusive to say that these additional QTLs that our proposed

model detected are nonexistent. In fact, Vaughn et al. (1999) identified 17 QTLs, although

most of them are suggestive, using simple interval mapping.

The estimated genotype mean curves for the detected QTLs are shown in Figure ??.

Three genotypes at a QTL have different growth curves, indicating the temporal genetic

effects of this QTL on growth processes for mouse body mass. Some QTLs, like those

on chromosomes 6, 7 and 10, act in an additive manner because the heterozygote (Qq,

broken curves) are intermediate between the two homozygotes (QQ, solid curves and qq,

dot curves). Some QTL such as one on chromosome 11 are operational in a dominant way

since the heterozygote is very close to one of the homozygotes.

57

01020304050607080

LR

01020304050607080

LR

01020304050607080

LR

1 2

3 4

5

6 7

8 9

10

11

12

13

14

15

16

17

18

19

Tes

t Pos

ition

99%

cut

−of

f

95%

cut

−of

f

QT

L lo

catio

n

20 c

M

Fig

ure

2-3.

.T

he

genom

icpos

itio

nco

rres

pon

din

gto

the

pea

kof

the

curv

eis

the

opti

mal

like

lihood

esti

mat

eof

the

QT

Llo

caliza

tion

indic

ated

by

vert

ical

bro

ken

lines

.T

he

tick

son

the

x-a

xis

indic

ate

the

pos

itio

ns

ofm

arke

rson

the

chro

mos

ome.

The

map

dis

tance

s(i

nce

nti

-Mor

gan)

bet

wee

ntw

om

arke

rsar

eca

lcula

ted

usi

ng

the

Hal

dan

em

appin

gfu

nct

ion.

The

thre

shol

ds

for

clai

min

gth

ege

nom

e-w

ide

exis

tence

ofa

QT

Lar

esh

own

by

hor

izon

tal

lines

.

58

Table 2-5. Available markers and phenotype data of a linkage map in an F2 population ofmice (data from Vaughn et al., 1999).

Marker IntervalsChromosome 1 2 3 4 5 6 7 8

1 378 433 483 467 450 440 4662 414 404 453 465 4303 477 491 489 476 4754 461 475 481 481 4915 441 439 449 381 3856 467 483 485 4817 407 424 459 452 378 372 428 4158 395 453 4729 498 496 49810 401 406 481 490 49711 431 451 468 464 44612 497 489 483 48813 450 443 46614 443 475 49515 491 494 46816 49817 371 39418 487 479 42019 445 468 468

59

0

10

20

30

40

50

60

70

LR

0

10

20

30

40

50

60

70

LR

chrom 6 chrom 7

individual λ ’s, CV max λ ’s, CV

arbitrary λ ’s

45.2

62

.0

86.1

94

.1

26.1

36

.5

46.0

48

.7

60.3

68

.0

82.0

90.0

Figure 2-4. Log-likelihood ratio (LR) plots for chromosomes 6 and 7 of the mice data. Thesolid line plot is based on cross-validated (CV) tuning parameters at eachsearch point (individual λ’s). The broken line plot is based on cross-validatedtuning parameters (max λ’s) corresponding to the maximum LR in eachmarker interval. The dotted line plot is based on two different arbitrary tuningparameter values, each assumed at all search points. Slight differences betweenthe solid and broken line plots may be due to different sample sizes amongmarker intervals (see Table 5).

60

24

68

100510152025303540

Tim

e (w

eek)

Weight (g)2

46

810

0510152025303540

Tim

e (w

eek)

Weight (g)

24

68

100510152025303540

Tim

e (w

eek)

Weight (g)

24

68

100510152025303540

Tim

e (w

eek)

Weight (g)

24

68

100510152025303540

Tim

e (w

eek)

Weight (g)

24

68

100510152025303540

Tim

e (w

eek)

Weight (g)

24

68

100510152025303540

Tim

e (w

eek)

Weight (g)

chro

m 1

ch

rom

4

chro

m 6

chro

m 7

ch

rom

10

chro

m 1

1

chro

m 1

5 G

enot

ype

1

Gen

otyp

e 2

Gen

otyp

e 3

Fig

ure

2-5.

fig:

Mea

ns

61

2.5 Summary and Discussion

Covariance estimation is an important aspect in modeling longitudinal data.

It is difficult, however, because of the large number of parameters to estimate and

the positive-definite constraint. Many longitudinal data models resort to structured

covariances which, although positive-definite and computationally favorable due to a

reduced number of parameters, are possibly highly biased. However, Pourahmadi (1999,

2000) recognized that a positive-definite estimator can be found if modeling is done

through the components of the modified Cholesky decomposition of the covariance matrix

which converts the problem into modeling a set of regression equations. Huang et al.

(2006) employed LASSO and ridge regression techniques through L1 and L2 penalties,

respectively, in a normal penalized likelihood framework to obtain a regularized covariance

estimator. Using these penalties allows shrinkage in the elements of T , even setting some

of them to zero in the case of the L1 penalty.

In this chapter, we adopted Huang et al.’s L2 penalty approach in functional

mapping. This penalty works best when the true T matrix has many small elements.

Using the L1 penalty gives a better estimator when some of the elements of T are actually

zero. However, we believe that the differences in results between using either penalties

will not be significant unless the dimension is very large. Nonetheless, the L1 penalty can

be easily incorporated into our scheme. We have shown how to integrate Huang et al.’s

procedure into the mixture likelihood framework of functional mapping. The key was

to utilize the posterior probability representation of the derivative of the log-likelihood,

2–27, and apply an L2 penalty to the negative log-likelihood. Estimation was then carried

out using the ECM algorithm (Section 2.3.2) with two CM-steps, based on a partition of

the mean and covariance parameters. Our simulations have shown better accuracy and

precision in estimates for QTL location, genotype mean parameters, and maxLR values,

by ΣNP compared to ΣAR(1). The maxLR values are important because the complete LR

plot provides the amount of evidence for the existence of a QTL. LR values noticeably

62

change when very different covariance structures are used. This is of course under the

assumption of multivariate normal data. In our analysis of the mice data, although there

were a few chromosomes that were found to have significant QTLs, chromosomes 6 and

7 (Figure 2-3) seemed to have the largest evidence for QTL existence. The LR plots are

also used in permutation tests (Section 1.2.3) to find a significance threshold. More precise

estimates of the covariance structure means better estimates of the the peak of the LR

plot and therefore more reliable permutation tests results.

With regards to the utilization of our proposed model, we suggest a preliminary

analysis of the data by checking variance and covariance stationarity. If these latter

conditions are satisfied then ΣAR(1) may be an appropriate model. If covariance stationarity

is not an issue then a TBS method (Section 1.2.1) coupled with using ΣAR(1) is applicable.

If no stationarity is detected then an SAD (Section 1.2.1) or ΣNP may be more useful.

Although we did not assess the comparative performance of these two models, we

think that SAD becomes more computationally intensive if the data exhibits long-term

dependence, in which case ΣNP may be more appropriate. ΣNP should also be considered

if other parametric structures are suspect.

63

CHAPTER 3NONPARAMETRIC COVARIANCE ESTIMATION IN FUNCTIONAL

MAPPING OF REACTION NORMS TO TWOENVIRONMENTAL SIGNALS

3.1 Introduction

The phenotypic plasticity of a quantitative trait occurs if the trait changes its

phenotypes with changing environment. Such environment-dependent changes, also

called reaction norms, are ubiquitous in biology. For example, thermal reaction norms

show how performance, such as caterpillar growth rate (Kingsolver et al., 2004) or

growth rate and body size in ectotherms (Angilletta et al.,2004), varies continuously

with temperature (Yap et al., 2007). Another example is the flowering time of Ara-

bidopsis thaliana with respect to changing light intensity (Stratton, 1998). However,

reaction norms and their genetic basis are difficult to model because of the inherent

complexity in the interplay of a multitude of factors involved. An added difficulty is in

their being ”infinite-dimensional” as they require an infinite number of measurements to

be completely described (Kirkpatrick and Heckman, 1989). Wu et al. (2007a) proposed

a functional mapping-based model (Section 1.2) which addresses the latter difficulty by

using a biologically relevant mathematical function to model reaction norms. The authors

considered a parametric model of photosynthetic rate as a function of light irradiance

and temperature and studied the genetic mechanism of such process. They showed,

through extensive simulations, that in a backcross population with one or two-QTLs,

their method accurately and precisely estimated the QTL location(s) and the parameters

of the mean model. However, they assumed the covariance matrix to be a Kronecker

product of two AR(1) structures, each modeling a reaction norm due to one environmental

factor. This type of covariance model is said to be separable. Although computationally

attractive, such model only captures separate reaction norm effects but fails to incorporate

interactions. A more general approach is therefore needed.

64

In the spatio-temporal (or space-time) literature, there exist separable and nonsep-

arable covariance structures which are used to model random processes in the spatial

and temporal domains. Nonseparable means the covariance cannot be expressed as a

Kronecker product of two matrices like separable structures can. The random processes

may be the concentration of pollutants in the atmosphere, groundwater contaminants,

wind speed, or even disposable household incomes. The main importance of the

covariance is in providing a better characterization of the random process to obtain

optimal prediction or kriging of unobserved portions of it. Unlike the separable ones,

nonseparable structures can model the interactions of the random processes in space

and time. Thus it seems natural to consider the utilization of nonseparable structures in

modeling reaction norms that react to two environmental factors. More concretely, we

consider the photosynthetic rate as a random process, and the irradiance and temperature

as the spatial (one dimension) and temporal domains, respectively.

In this chapter, we show through simulations that, in functional mapping of reaction

norms to two environmental signals, (1) nonseparable structures can be utilized as

covariance models and used to generate data of processes that exhibit interactions (2)

the separable model proposed by Wu et al. (2007), which we shall call ΣAR(1), may not

be appropriate for such data and (3) the nonparametric covariance estimator, ΣNP ,

developed in chapter 2, is a more reliable covariance model than ΣAR(1). By utility in (1),

we mean that a nonseparable model can analyze data generated by the same nonseparable

model. With regards to (2), our results are surprising because, for some variance of the

process or a certain number of levels in the environmental signals, the estimated QTL

location and mean model parameters are generally robust to a biased separable covariance

estimate, ΣAR(1), of a nonseparable underlying structure. That is, if the covariance of a

data generated from a nonseparable structure is estimated by the separable model, ΣAR(1),

the estimate is biased, as expected, but the QTL location and mean model parameters

are still accurately and precisely estimated. However, the estimated maxLR (Section

65

1.2.3) is not accurate because the true underlying covariance structure and the (biased)

estimate, ΣAR(1), produce different log-likelihood values. Recall that maxLR is important

because it is used in permutation tests to assess significance of QTL existence. But when

both the variance and the number of levels in the environmental signals are increased,

the estimated QTL location is severely biased while the mean parameters are only mildly

affected. ΣNP provides consistently better results over ΣAR(1). Of course, if nonseparable

covariance models themselves are used to analyze data that exhibit interactions, the

results are expected to be much better. However, in reality, the underlying structure of the

data is unknown and it is very difficult to identify an appropriate nonseparable model to

use in this case. Modelers often employ strategies that are mainly ad hoc or specific to a

problem. Unfortunately, there are no general guidelines that are available in approaching

these type of problems. We will, however, use nonseparable covariance models to generate

simulated data with interactions and use it to compare ΣNP and ΣAR(1).

This chapter is organized as follows: In Section 3.2, we describe the functional

mapping model proposed by Wu et al. (2007a) for reaction norms. In Section 3.3, we

discuss separable and nonseparable models used in spatio-temporal modeling. In Section

3.4, we present a simulation study using some nonseparable structures introduced in

Section 3.3 and then conclude with a summary and discussion in Section 3.5. In this

chapter, we may alternately use the terms covariance matrix, structure or function. They

all refer to the same thing.

3.2 Functional Mapping of Reaction Norms to Multiple EnvironmentalSignals

Wolf (2002) described a reaction norm as a surface landscape determined by genetic

and environmental factors. The surface is obtained as a phenotypic trait plotted against

different environmental factors such as temperature, light intensity, humidity, etc., and

corresponds to a specific genetic effect such as additive, dominant or epistatic (Wu et

al., 2007c). At least in three dimensions, the features of the surface such as ”slope”,

66

”curvature”, ”peak valley”, and ”ridge”, can be described mathematically and these can

help elucidate how the underlying factors affect the phenotype.

An example of a reaction norm that illustrates a surface landscape is photosyn-

thesis (Wu et al., 2007a) which is the process by which light energy is converted

to chemical energy by plants and other living organisms. It is an important but

complex process because it involves several factors such as the age of a leaf (where

photosynthesis takes place in most plants), the concentration of carbon dioxide in the

environment, temperature, light irradiance, available nutrients and water in the soil,

etc.. A mathematical expression for the rate of single-leaf photosynthesis, P , without

photorespiration is

P =1

2θ

(αI + Pm −

√(αI + Pm)2 − 4θαIPm

)(3–1)

(Thornley and Johnson, 1990), where θ ∈ (0, 1) is a dimensionless parameter, α is the

photochemical efficiency, I is the irradiance, and Pm is the asymptotic photosynthetic rate

at a saturating irradiance. Pm is a linear function of the temperature, T ,

Pm =

Pm(20)(

T−T ∗20−T ∗

)if T ≥ T ∗

0 if T < T ∗,

(3–2)

where Pm(20) is the value of Pm at the reference temperature of 20oC and T ∗ is the

temperature at which photosynthesis stops. T ∗ is chosen over a range of temperatures,

such as 5oC-25oC, to provide a good fit to observed data.

Wu et al. (2007a) studied the reaction norm of photosynthetic rate, defined by Eqs.

3–1 and 3–2, as a function of irradiance (I) and temperature (T ). That is, the authors

considered P = P (I, T ). Here, we assume that T ∗ = 5 so that the reaction norm model

parameters are (α, Pm(20), θ). The surface landscape that describes the reaction norm of

P (I, T ), with parameters (α, Pm(20), θ) = (0.02, 1, 0.9), is shown in Figure 3-1. As stated

earlier, each reaction norm surface corresponds to a specific genetic effect. Thus, if a QTL

67

is at work, the genetic effects produce different surfaces defined by distinct sets of model

parameters.

0

100

200

300

15

20

25

300

0.5

1.0

1.5

2.0

Irradiance (I)Temperature (T)

Pho

tosy

nthe

tic R

ate

(P)

Figure 3-1. Reaction norm surface of photosynthetic rate as a function of irradiance andtemperature. Model is based on Eqs. 3–1 and 3–2 with parameters(α, Pm(20), θ) = (0.02, 1, 0.9). Adapted from Wu et al. (2007a).

3.2.1 Likelihood

We consider only a backcross design (Section 1.1.2) with one QTL. Extensions

to more complicated designs and the two-QTL case, as in Wu et al. (2007a), are

straightforward. Assume a backcross plant population of size n with a single QTL

affecting the phenotypic trait of photosynthetic rate. The photosynthetic rate for each

progeny i (= 1, ..., n) is measured at different irradiance (s = 1, ..., S) and temperature

(t = 1, ..., T ) levels. This choice of variables is adopted for consistency in later discussions

as we will be working with spatio-temporal covariance models. The set of phenotype

measurements or observations can be written in vector form as

yi = [yi(1, 1), ..., yi(1, T )︸︷︷︸irradiance 1

, ..., yi(S, 1), ..., yi(S, T )︸︷︷︸irradiance S

]′. (3–3)

68

The progeny are genotyped for molecular markers to construct a genetic linkage map for

the segregating QTL in the population. This means that the genotypes of the markers are

observed and will be used, along with the phenotype measurements, to predict the QTL

(Section 1.1). Because we assume a backcross design, the QTL has two possible genotypes

(as do the markers) which shall be indexed by k = 1, 2. The likelihood function based on

the phenotype and marker data can be formulated as

L(Ω) =n∏

i=1

[2∑

k=1

pk|ifk(yi|Ω)

](3–4)

where pk|i is the conditional probability of a QTL genotype given the genotype of a marker

interval for progeny i (Section 1.1.4). We assume a multivariate normal density for the

phenotype vector yi with genotype-specific means

µk = [µk(1, 1), ..., µk(1, T )︸︷︷︸irradiance 1

, ..., µk(S, 1), ..., µk(S, T )︸︷︷︸irradiance S

]′ (3–5)

and covariance matrix Σ =cov(yi).

3.2.2 Mean and Covariance Models

The mean vector for photosynthetic rate in 3–5 can be modeled using Eqs. 3–1 and

3–2 as

µk(s, t) =1

2θk

(αks + Pmk −

√(αks + Pmk)2 − 4θkαksPmk

)(3–6)

where

Pmk(t) =

Pmk(20)(

t−T ∗20−T ∗

)if t ≥ T ∗

0 if t < T ∗(3–7)

and k = 1, 2.

Wu et al. (2007a) used a separable structure (Mitchell et al., 2005) for the ST × ST

covariance matrix Σ as

ΣAR(1) = Σ1 ⊗ Σ2 (3–8)

69

where Σ1 and Σ2 are the (S × S) and (T × T ) covariance matrices among different

irradiance and temperature levels, respectively, and ⊗ is the Kronecker product operator

(see Appendix D). Note that Σ1 and Σ2 are unique only up to multiples of a constant

because for some |c| > 0, cΣ1 ⊗ (1/c)Σ2 = Σ1 ⊗Σ2. Each of Σ1 and Σ2 is modeled using an

AR(1) structure with a common error variance, σ2, and correlation parameters ρ1 and ρ2:

Σ1 = σ2

1 ρ1 . . . ρS−11

ρ1 1 . . . ρS−21

......

. . ....

ρS−11 ρS−2

1 . . . 1

, Σ2 = σ2

1 ρ2 . . . ρT−12

ρ2 1 . . . ρT−21

......

. . ....

ρT−12 ρT−2

2 . . . 1

(3–9)

Separable covariance structures, however, cannot model interaction effects of each reaction

norm to temperature and irradiance. Thus, there is a need for a more general model for

this purpose.

Note that with 3–6, 3–7, 3–8 and 3–9, Ω = α1, Pm1(20), θ1, α2, Pm2(20), θ2, σ2, ρ1, ρ2

in 3–4. These model parameters may be estimated using the ECM algorithm but closed

form solutions at the CM-step could be very complicated. A more efficient method is the

Nelder-Mead simplex algorithm (Section 2.3.2) .

3.2.3 Hypothesis Tests

The features of the surface landscape are important because they can be used as

a basis in formulating hypothesis tests. Let Ho and H1 denote the null and alternative

hypotheses, respectively. Then the existence of a QTL that determines the reaction norm

curves can be formulated as

H0 : α1 = α2, Pm1(20) = Pm2(20), θ1 = θ2

H1 : at least one of the inequalities above does not hold.

This means that if the reaction norm curves are distinct (in terms of their respective

estimated parameters), then a QTL possibly exists. Of course a slight difference in

parameter estimates does not automatically mean a QTL exists. But the significance

of the results can be tested by doing permutation tests using the log-likelihood ratio

70

between the null and alternative hypotheses (Section 1.2.3). A procedure described in

Wu et al. (2004a) can be used to test the additive effects of a QTL. Other hypotheses

can be formulated and tested such as the genetic control of the reaction norm to each

environmental factor, interaction effects between environmental factors on the phenotype,

and the marginal slope of the reaction norm with respect to each environmental factor or

the gradient of the reaction norm itself. The reader is referred to Wu et al. (2007a) for

more details.

3.3 Spatio-temporal Covariance Functions

3.3.1 Introduction

In this section, we investigate parametric nonseparable spatio-temporal covariance

structures for functional mapping of photosynthetic rate as a reaction norm to the

environmental factors irradiance and temperature. As stated earlier, the main idea is to

model irradiance as a one-dimensional spatial variable and temperature as a temporal

variable. Nonparametric methods are also available but are limited to either spatial

(Sampson and Guttorp, 1992; Li et al., 2007) or time series (Li et al., 2007) only and

not joint spatio-temporal. Schabenberger and Gotway (2005) noted that the statistical

methods available in analyzing spatio-temporal processes are not yet as fully developed as

those for spatial or time series alone. This is mainly because joint spatio-temporal analysis

is very difficult. One major difficulty is in producing a covariance function that is positive

definite. Until recently, some researchers have resorted to parametric separable models

to circumvent this difficulty. Aside from computational benefits, separable models allow

conditional analysis of processes with respect to the spatial and temporal domains which

can be combined to produce a joint spatio-temporal model. This strategy is helpful and

often used as an exploratory analysis tool prior to fitting a nonseparable model. Unlike

separable ones, nonseparable models cannot be expressed as a Kronecker product of two

matrices. But they are more general (and usually more complicated and have many more

71

parameters) because they can model interactions between spatial and temporal processes

and some of them allow separable models as special cases.

The construction of valid (positive-definite) nonseparable covariance models has

taken great strides in recent years. Schabenberger and Gotway (2005) describe four main

approaches: (1) Gneiting’s (2002) monotone function, (2) Cressie and Huang’s (1999)

spectral method, (3) mixture (Ma, 2007), and (4) Jones and Zhang’s (1997) partial

differential equation. (1) and (2) utilize mainly statistical principles whereas (3) and (4)

are mostly mathematical in nature. We shall discuss (1) and (2) in Section 3.3.4 and use

examples derived from these approaches in the simulations (Section 3.4).

3.3.2 Basic Ideas, Notation, and Assumptions

A spatio-temporal random process can be represented by

Y (s, t), (s, t) ∈ Rd × R, (3–10)

where observations are collected at N spatio-temporal coordinates (s1, t1), (s2, t2), ..., (sN , tN)

and d ∈ Z+. The data are only a partial realization of the process because, for practical

reasons, the process cannot be observed at each coordinate. Gneiting (2002) notes that

mathematically, the space-time domain Rd × R and the purely spatial domain Rd+1 are

equivalent. This means that the space-time covariance functions in Rd × R and spatial

covariance functions in Rd+1 belong to the same class. However, the notation Rd × Ris used to highlight the distinction between the respective domains. In this study, we

will only be concerned with the case d = 1 so that, from hereon, we will use R instead

of Rd for the spatial domain. Aside from those mentioned in the introduction (Section

3.1), Y may also represent ozone levels, disease incidence, ocean current patterns, water

temperatures, etc. In our study, Y represents photosynthetic rate.

If var(Y (s, t)) < ∞ for all (s, t) ∈ R × R, then the mean E[Y (s, t)] and covariance

cov(Y (s, t), Y (s + u, t + v)), where u and v are spatial and temporal lags, respectively,

both exist. We assume that the covariance is stationary in space and time so that for some

72

functions C,

cov(Y (s, t), Y (s + u, t + v)) = C(u, v). (3–11)

This means that the covariance function, C, depends only on the lags and not on the

values of the coordinates themselves. Stationarity is often assumed to allow estimation of

the covariance function from the data (Cressie and Huang, 1999). We also assume that the

covariance function is isotropic which means that it depends only on the absolute lags and

not in the direction or orientation of the coordinates to each other:

cov(Y (s, t), Y (s + u, t + v)) = C(|u|, |v|). (3–12)

Stationary and isotropic covariance functions are said to be translation and rotation-invariant

(about the origin) (Waller and Gotway, 2004). Note that C(u, 0) and C(0, v) correspond to

purely spatial and purely temporal covariance functions, respectively.

To be a valid covariance function, C must be positive definite. This means that for

any (s1, t1), ..., (sk, tk) ∈ R× R, any real coefficients a1, ..., ak, and any positive integer k,

k∑i=1

k∑j=1

aiajC(si − sj, ti − tj) ≥ 0 (3–13)

Note that based on Eq. 3–13, C should really be nonnegative-definite. However, this is the

way it is defined in the literature and we will adhere to this convention.

In spatio-temporal analysis, the ultimate goal is optimal prediction (or kriging) of an

unobserved part of the random process using an appropriate covariance function model.

In this study, we utilize a nonseparable covariance to calculate the mixture likelihood

associated with functional mapping.

3.3.3 Separable Covariance Structures

A covariance function C(u, v|θ) of a spatio-temporal process is separable if it can be

expressed as

C(u, v|θ) = C1(u|θ1)C2(v|θ2), (3–14)

73

where C1(u|θ1) and C2(v|θ2) are purely spatial and purely temporal covariance functions,

respectively, and θ = (θ1, θ2)′. This representation implies that the observed joint

process can be seen as a product of two independent spatial and temporal processes. A

formulation in terms of the joint process is

C(u, v|θ) =C(u, 0|θ)C(0, v|θ)

σ2, (3–15)

where σ2 = C(0, 0) is the variance of the process.

With representation 3–14, separable models have an advantage. For example, models

for C(u, v|θ) can be easily constructed by selecting suitable and readily available choices

for each of C1(u|θ1) and C2(v|θ2). Because many of these choices are positive-definite,

C(u, v|θ) is guaranteed to be positive-definite also. An example is

C(u, v|a, b) = exp(−a|u| − b|v|) (3–16)

where C1(u|a) = exp(−a|u|) and C2(v|b) = exp(−b|v|). Notice that for any given spatial

lags u1 and u2, C(u1, v|a, b) and C(u2, v|a, b) are proportional to each other. This means

that the plots of the temporal covariances have the same shapes at these spatial lags. This

property is important in the spectral construction of valid nonseparable models proposed

by Cressie and Huang (1999) (Section 3.3.4.1). For separable models, the processes in

the spatial and temporal domains do not act on each other and hence the selection of

an appropriate model for C(u, v|θ) can be facilitated by doing separate (conditional)

exploratory data analyses of spatial and temporal patterns.

A more general definition for separability is as a Kronecker product, as in Eq. 3–8.

From Eq. 3–8, it can be shown that Σ−1AR(1) = Σ−1

1 ⊗ Σ−12 and |ΣAR(1)| = |Σ1||Σ2|,

where | · | denotes the determinant of a matrix. Thus, another advantage of separable

models is computational efficiency, particularly in likelihood models where the inverse

and determinant of the covariance matrix are calculated. For a large covariance matrix of

dimension UV , its inverse can be calculated from the inverses of its Kronecker component

74

matrices, Σ1 and Σ2, with dimensions U and V , respectively. Thus, the inversion of a

100 × 100 matrix, for example, may only require the inversion of two 10 × 10 matrices.

A similar argument can be used for the determinant. Note that ΣAR(1) can be put in the

form 3–14 as

C(u, v|σ2, ρ1, ρ2) = σ2ρu1 · σ2ρv

2 = σ4ρu1ρ

v2, (3–17)

where u = 1, ..., U , v = 1, ..., V . This model assumes equidistant or regularly spaced

coordinates. Thus, two consecutive or closest neighbor coordinates will have the same

correlation structure as another even if their respective distances are different. A more

appropriate model might be

C(u, v|σ2, ρ1, ρ2, a, b) = σ4ρu/a1 ρ

v/b2 (3–18)

where a and b are scale parameters. However, this model is more complex than ΣAR(1) in

the sense that it has more parameters (5 vs 3) to estimate. The question of which model is

better will lead us to a model selection issue.

3.3.4 Nonseparable Covariance Structures

In this section, we review two methods in the construction of nonseparable spatio-temporal

models: the spectral (Cressie and Huang, 1999) and monotone function (Gneiting, 2002)

approaches. The discussion is for d = 1 but this can be generalized to the case d > 1.

3.3.4.1 Spectral method by Cressie and Huang (1999)

We assume that the covariance function C is continuous. If C is positive definite,

then the process has a spectral distribution (Matern, 1960; Cressie and Huang, 1999). If

the spectral density exists then Bochner’s theorem (Bochner, 1955) states that C can be

represented as

C(u, v) =

∫ ∫ei(uω+vτ)g(ω, τ)dωdτ, (3–19)

75

where g(ω, τ) is the spectral density. It can be shown (Cressie and Huang, 1999; Appendix

E) that Eq. 3–19 can be expressed as

C(u, v) =

∫eiuωρ(ω, v)κ(ω)dω. (3–20)

3–20 can be used to find valid covariance functions by selecting appropriate forms for

ρ(ω, v) and κ(ω). To get nonseparable structures, ρ(ω, v) must not be independent of ω.

Otherwise, C(u, v) will be separable.

Cressie and Huang gave seven examples of valid nonseparable covariance functions

constructed from certain choices for ρ(ω, v) and κ(ω) and using equation 3–20. We present

three of them here and use the first two in the simulations.

Example 1

The three-parameter nonseparable stationary covariance function, Example 1 of

Cressie and Huang (1999), given by

C(u, v) =σ2

√(a2v2 + 1)

exp

(− b2u2

a2v2 + 1

), (3–21)

where a, b ≥ 0 are the scaling parameters of time and space, respectively, and σ2 = C(0, 0).

Example 2

The three-parameter nonseparable stationary covariance function, Example 4 of

Cressie and Huang (1999), given by

C(u, v) =σ2(a|v|+ 1)

(a|v|+ 1)2 + b2|u|2 , (3–22)

where a, b ≥ 0 are the scaling parameters of time and space, respectively, and σ2 = C(0, 0).

Example 3

The four-parameter nonseparable stationary covariance function, Example 6 of Cressie

and Huang (1999), given by

C(u, v) = σ2 exp(−a|v| − b2|u|2 − c|v||u|2), (3–23)

76

where a, b ≥ 0 are the scaling parameters of time and space, respectively, c is an

interaction parameter of time and space, and σ2 = C(0, 0). Note that when c = 0,

3–23 reduces to a separable model.

3.3.4.2 Monotone function method by Gneiting (2002)

Although various nonnegative integrable functions can be used as a spectral density

function, the spectral method can be limited if no closed form solution can be obtained in

either 3–19 or 3–20. Gneiting (2002) developed an approach that does not rely on Fourier

transform pairs and avoids this kind of limitation.

Let φ(x) and ψ(x) be functions with nonnegative domains. Suppose φ(x) is

completely monotone and ψ(x) is positive with a completely monotone derivative. Then

C(u, v) =σ2

√ψ(|v|2)φ

( |u|2ψ(|v|2)

), (u, v) ∈ R× R, (3–24)

where σ2 = C(0, 0) > 0, is a valid nonseparable spatio-temporal covariance model.

φ(x) and ψ(x) can be associated with spatial and temporal structures, respectively, and

Gneiting (2002) provides a list of functions that can be used for each. For example, using

φ(x) = exp(−bxβ), b > 0, β ∈ (0, 1],

and

ψ(x) = (axα + 1)γ, a > 0, α ∈ (0, 1], γ ∈ [0, 1],

leads to

C(u, v) =σ2

(a|v|2α + 1)γ/2exp

(− b|u|2β

(a|v|2α + 1)βγ

), (u, v) ∈ R× R. (3–25)

Multiplying 3–25 by the purely temporal covariance function (a|v|2α + 1)−δ, v ∈ R, with

δ ≥ 0, produces

C(u, v) =σ2

(a|v|2α + 1)δ+γ/2exp

(− b|u|2β

(a|v|2α + 1)βγ

), (u, v) ∈ R× R. (3–26)

77

where a, b > 0 are scaling parameters of space and time, respectively; α, β ∈ (0, 1] are

smoothness parameters of space and time, respectively; γ ∈ [0, 1], and σ2 ≥ 0. A useful

reparametrization of 3–26 is

C(u, v) =σ2

(a|v|2α + 1)τexp

(− b|u|2β

(a|v|2α + 1)βγ

), (u, v) ∈ R× R. (3–27)

where τ ≥ 1/2 replaces δ + γ/2. γ is a space-time interaction parameter which implies a

separable structure when 0 and nonseparable structure otherwise. Increasing values of γ

indicates strengthening spatio-temporal interaction.

3.4 Simulations

In this section, we investigate the performance of the three nonseparable covariances

structures 3–21, 3–22 (Examples 1 and 2 of Section 3.3.4.1) and 3–27, denoted as follows:

C1(u, v) =σ2

√(a2v2 + 1)

exp

(− b2u2

a2v2 + 1

), a, b ≥ 0; σ2 > 0, (3–28)

C2(u, v) =σ2(a|v|+ 1)

(a|v|+ 1)2 + b2|u|2 , a, b ≥ 0; σ2 > 0, (3–29)

C3(u, v) =σ2

(a|v|2α + 1)τexp

(− b|u|2β

(a|v|2α + 1)βγ

), a, b ≥ 0; α, β ∈ (0, 1]; τ ≥ 1/2; γ ∈ [0, 1]; σ2 > 0.

(3–30)

To simplify our analysis, we assume for C3(u, v) that α = 1/2, β = 1/2, and τ = 1 so that

C3(u, v) =σ2

(a|v|+ 1)exp

(− b|u|

(a|v|+ 1)γ/2

), a, b ≥ 0; γ ∈ [0, 1]; σ2 > 0. (3–31)

We then generate data using these nonseparable structures to simulate interaction

effects between the two environmental signals in functional mapping of a reaction norm.

Simulations using separable and nonseparable covariance structures for spatio-temporal

process were studied by Huang et al. (2007a). The generated data is analyzed using

the nonparametric estimator, ΣNP , developed in chapter 2 and ΣAR(1) to assess their

performance. We also want to test whether the separable model, ΣAR(1), can be used to

78

analyze data generated from nonseparable covariance structures in three different cases.

Covariance fit is assessed using entropy and quadratic losses (Section 2.4.1).

Using a backcross design (2 genotype groups; Section 1.1.2) for the QTL mapping

population, we randomly generated 6 markers equally spaced on a chromosome 100 cM

long. One QTL was simulated between the fourth and fifth markers, 12 cM from the

fourth marker (or 72 cM from the leftmost marker of the chromosome). The QTL had

two possible genotypes which determined two distinct mean photosynthetic rate reaction

norm surfaces defined by Eqs. 3–1 and 3–2 (see Figure 3-1). The surface parameters

for each genotype group were (α1, Pm1(20), θ1) = (0.02, 2, 0.9) and (α2, Pm2(20), θ2) =

(0.01, 1.5, 0.9). Phenotype observations were obtained by sampling from a multivariate

normal distribution with mean surface based on irradiance and temperature levels of

0, 50, 100, 200, 300 and 15, 20, 25, 30, respectively and covariance matrix Cl(u, v), l =

1, 2, 3.

The functional mapping model was applied to the marker and phenotype data with

n = 200, 400 samples. The surface defined by Eqs. 3–1 and 3–2 was used as mean model

and Cl(u, v) as covariance model to analyze the data generated using Cl(u, v). That is,

we modeled data generated by the same mean and covariance used in the model. 100

simulation runs were carried out and the averages on all runs of the estimated QTL

location, mean parameter estimates, maxLR, entropy and quadratic losses (see Section

2.4.1), including the respective Monte carlo standard errors (SE), were recorded. The

results are shown in Tables 3-1 and 3-2. Table 3-2 also includes the results for ΣAR(1).

Both tables show accurate and precise estimates of QTL location, mean surface and

covariance parameters.

Next, ΣNP and ΣAR(1) were used to analyze the data generated by each of Cl(u, v), l =

1, 2, 3 . Tables 3-3 and 3-4 show the results of these respective analyses. The results for

ΣNP are very good. However, those for ΣAR(1) are somewhat unexpected. Apparently, the

estimated QTL location and mean parameters are accurate and precise! This would imply

79

that these aspects of the model are robust to misspecification of the covariance structure.

Even the maxLR values are very close to the corresponding ones in Tables 3-1 and 3-2,

which should be (almost) the true values. The average losses, however, are inflated for

C1 and C2. Upon close inspection, it turns out that it is misleading to look at maxLR

in this situation. What should be considered are the log-likelihood values under the null

and alternative models from which maxLR is derived. Figure 3-2 provides box plots of

the log-likelihood values under the alternative model based on the 100 simulation runs.

These plots reveal clear biased estimates of C1 and C2 by ΣAR(1) and the degrees of bias

are consistent with the average losses. The results for the null model are very similar but

are not presented here. We also provide the covariance and corresponding contour plots of

Cl(u, v), l = 1, 2, 3 and the ΣAR(1) estimates of these in Figures 3-3 and 3-4.

We conducted further simulations under C1 with n = 400, the case where ΣAR(1)

performed the worst. We considered two scenarios: increased variance (σ2 = 2, 4) and

number of irradiance (0, 50, 100, 150, 200, 250, 300) and temperature (15, 18, 21, 24, 27, 30)levels. The results are shown in Tables 3-5 and 3-6, respectively. The results show that

under these two scenarios, the estimate of the QTL location is severely biased if one uses

ΣAR(1). This is not the case for ΣNP .

80

Tab

le3-

1.A

vera

ged

QT

Lpo

siti

on,m

ean

curv

epa

ram

eter

s,m

axim

umlo

g-lik

elih

ood

rati

os(m

axL

R),

entr

opy

and

quad

rati

clo

sses

and

thei

rst

anda

rder

rors

(giv

enin

pare

nthe

ses)

for

two

QT

Lge

noty

pes

ina

back

cros

spo

pula

tion

unde

rdi

ffere

ntsa

mpl

esi

zes

(n)

base

don

100

sim

ulat

ion

repl

icat

es(N

onse

para

ble

Mod

el).

QT

LQ

TL

geno

type

1Q

TL

geno

type

2C

ovar

ianc

en

Loc

atio

nα

1P

m1(2

0)θ 1

α2

Pm

2(2

0)θ 2

max

LR

LE

LQ

σ2

ab

C1

200

71.9

60.

022.

000.

900.

011.

540.

8813

1.46

0.02

0.19

1.00

0.50

0.01

(0.3

2)(0

.00)

(0.0

1)(0

.00)

(0.0

0)(0

.02)

(0.0

1)(2

.31)

(0.0

0)(0

.04)

(0.0

0)(0

.00)

(0.0

0)40

072

.00

0.02

2.01

0.90

0.01

1.52

0.89

262.

110.

011.

131.

000.

500.

01(0

.20)

(0.0

0)(0

.01)

(0.0

0)(0

.00)

(0.0

1)(0

.01)

(3.0

0)(0

.00)

(0.0

2)(0

.00)

(0.0

0)(0

.00)

Tru

e:72

.00

0.02

2.00

0.90

0.01

1.50

0.09

--

-1.

000.

500.

01

QT

LQ

TL

geno

type

1Q

TL

geno

type

2C

ovar

ianc

en

Loc

atio

nα

1P

m1(2

0)θ 1

α2

Pm

2(2

0)θ 2

max

LR

LE

LQ

σ2

ρ1

ρ2

C2

200

72.0

00.

022.

000.

900.

011.

540.

8814

9.63

0.02

0.19

1.00

1.02

0.01

(0.3

0)(0

.00)

(0.0

1)(0

.00)

(0.0

0)(0

.01)

(0.0

1)(2

.37)

(0.0

0)(0

.04)

(0.0

0)(0

.02)

(0.0

0)40

071

.84

0.02

2.01

0.90

0.01

1.52

0.89

299.

290.

010.

131.

001.

010.

01(0

.18)

(0.0

0)(0

.01)

(0.0

0)(0

.00)

(0.0

1)(0

.01)

(3.0

6)(0

.00)

(0.0

2)(0

.00)

(0.0

1)(0

.00)

Tru

e:72

.00

0.02

2.00

0.90

0.01

1.50

0.90

--

-1.

001.

000.

01

81

Tab

le3-

2.A

vera

ged

QT

Lpo

siti

on,m

ean

curv

epa

ram

eter

s,m

axim

umlo

g-lik

elih

ood

rati

os(m

axL

R),

entr

opy

and

quad

rati

clo

sses

and

thei

rst

anda

rder

rors

(giv

enin

pare

nthe

ses)

for

two

QT

Lge

noty

pes

ina

back

cros

spo

pula

tion

unde

rdi

ffere

ntsa

mpl

esi

zes

(n)

base

don

100

sim

ulat

ion

repl

icat

es(N

onse

para

ble

Mod

el).

QT

LQ

TL

geno

type

1Q

TL

geno

type

2C

ovar

ianc

en

Loc

atio

nα

1P

m1(2

0)θ 1

α2

Pm

2(2

0)θ 2

max

LR

LE

LQ

σ2

ab

c

C3

200

71.9

60.

022.

010.

890.

011.

550.

8712

6.80

0.02

0.19

1.00

1.03

0.01

0.62

(0.3

4)(0

.00)

(0.0

1)(0

.01)

(0.0

0)(0

.02)

(0.0

1)(2

.20)

(0.0

0)(0

.04)

(0.0

0)(0

.02)

(0.0

0)(0

.02)

400

71.9

20.

022.

010.

900.

011.

520.

8925

3.38

0.01

0.13

1.00

1.01

0.01

0.61

(0.2

0)(0

.00)

(0.0

1)(0

.00)

(0.0

0)(0

.01)

(0.0

1)(2

.83)

(0.0

0)(0

.02)

(0.0

0)(0

.01)

(0.0

0)(0

.02)

Tru

e:72

.00

0.02

2.00

0.90

0.01

1.50

0.09

--

-1.

001.

000.

010.

60

QT

LQ

TL

geno

type

1Q

TL

geno

type

2C

ovar

ianc

en

Loc

atio

nα

1P

m1(2

0)θ 1

α2

Pm

2(2

0)θ 2

max

LR

LE

LQ

σ2

ρ1

ρ2

ΣA

R(1

)20

072

.12

0.02

2.01

0.89

0.01

1.55

0.88

75.0

80.

020.

191.

000.

600.

60(0

.44)

(0.0

0)(0

.01)

(0.0

1)(0

.00)

(0.0

2)(0

.01)

(1.8

0)(0

.00)

(0.0

4)(0

.00)

(0.0

0)(0

.00)

400

72.0

00.

022.

010.

900.

011.

520.

8914

9.14

0.01

0.13

(0.2

6)(0

.00)

(0.0

1)(0

.00)

(0.0

0)(0

.01)

(0.0

1)(2

.33)

(0.0

0)(0

.02)

(0.0

0)(0

.00)

(0.0

0)

Tru

e:72

.00

0.02

2.00

0.90

0.01

1.50

0.90

--

-1.

000.

600.

60

82

Tab

le3-

3.A

vera

ged

QT

Lpo

siti

on,m

ean

curv

epa

ram

eter

s,m

axim

umlo

g-lik

elih

ood

rati

os(m

axL

R),

entr

opy

and

quad

rati

clo

sses

and

thei

rst

anda

rder

rors

(giv

enin

pare

nthe

ses)

for

two

QT

Lge

noty

pes

ina

back

cros

spo

pula

tion

unde

rdi

ffere

ntsa

mpl

esi

zes

(n)

base

don

100

sim

ulat

ion

repl

icat

es(Σ

NP).

QT

LQ

TL

geno

type

1Q

TL

geno

type

2C

ovar

ianc

en

Loc

atio

nα

1P

m1(2

0)θ 1

α2

Pm

2(2

0)θ 2

max

LR

LE

LQ

C1

200

71.6

80.

022.

020.

900.

011.

520.

8899

.39

1.04

2.03

(0.2

8)(0

.00)

(0.0

1)(0

.00)

(0.0

0)(0

.02)

(0.0

1)(5

.10)

(0.0

1)(0

.02)

400

72.1

60.

022.

000.

900.

011.

520.

8818

9.34

0.53

1.06

(0.2

3)(0

.00)

(0.0

1)(0

.00)

(0.0

0)(0

.01)

(0.0

1)(2

.08)

(0.0

0)(0

.01)

C2

200

71.8

80.

022.

000.

900.

011.

530.

8810

9.47

1.00

1.96

(0.2

9)(0

.00)

(0.0

1)(0

.00)

(0.0

0)(0

.01)

(0.0

1)(5

.65)

(0.0

1)(0

.02)

400

71.9

20.

022.

000.

900.

011.

520.

8921

4.76

0.52

1.02

(0.1

7)(0

.00)

(0.0

1)(0

.00)

(0.0

0)(0

.01)

(0.0

1)(2

.31)

(0.0

0)(0

.01)

C3

200

72.1

20.

022.

010.

890.

011.

540.

8710

2.73

0.88

1.70

(0.3

7)(0

.00)

(0.0

1)(0

.01)

(0.0

0)(0

.02)

(0.0

1)(1

.90)

(0.0

1)(0

.02)

400

72.0

80.

022.

010.

900.

011.

520.

8919

8.32

0.48

0.94

(0.2

0)(0

.00)

(0.0

1)(0

.00)

(0.0

0)(0

.01)

(0.0

1)(2

.36)

(0.0

0)(0

.01)

Tru

e:72

.00

0.02

2.00

0.90

0.01

1.50

0.90

83

Tab

le3-

4.A

vera

ged

QT

Lpo

siti

on,m

ean

curv

epa

ram

eter

s,m

axim

umlo

g-lik

elih

ood

rati

os(m

axL

R),

entr

opy

and

quad

rati

clo

sses

and

thei

rst

anda

rder

rors

(giv

enin

pare

nthe

ses)

for

two

QT

Lge

noty

pes

ina

back

cros

spo

pula

tion

unde

rdi

ffere

ntsa

mpl

esi

zes

(n)

base

don

100

sim

ulat

ion

repl

icat

es(Σ

AR

(1))

.Q

TL

QT

Lge

noty

pe1

QT

Lge

noty

pe2

Cov

aria

nce

nLoc

atio

nα

1P

m1(2

0)θ 1

α2

Pm

2(2

0)θ 2

max

LR

LE

LQ

C1

200

72.3

20.

022.

030.

900.

011.

530.

8712

2.96

19.4

368

1.78

(0.4

5)(0

.00)

(0.0

1)(0

.01)

(0.0

0)(0

.02)

(0.0

1)(2

.43)

(0.0

7)(6

.16)

400

71.7

20.

022.

030.

900.

011.

510.

8924

5.85

19.4

568

4.11

(0.2

7)(0

.00)

(0.0

1)(0

.00)

(0.0

0)(0

.01)

(0.0

1)(3

.33)

(0.0

5)(4

.40)

C2

200

71.9

60.

022.

010.

900.

011.

550.

8713

0.93

4.83

58.6

0(0

.34)

(0.0

0)(0

.01)

(0.0

0)(0

.00)

(0.0

2)(0

.01)

(2.3

0)(0

.02)

(1.0

1)40

071

.84

0.02

2.01

0.90

0.01

1.52

0.89

262.

504.

8358

.61

(0.2

0)(0

.00)

(0.0

1)(0

.00)

(0.0

0)(0

.01)

(0.0

1)(3

.06)

(0.0

2)(0

.77)

C3

200

72.0

00.

022.

010.

890.

011.

540.

8712

4.98

0.60

1.51

(0.3

5)(0

.00)

(0.0

1)(0

.01)

(0.0

0)(0

.02)

(0.0

1)(2

.23)

(0.0

0)(0

.10)

400

71.9

60.

022.

010.

890.

011.

520.

8925

0.76

0.60

1.43

(0.2

2)(0

.00)

(0.0

1)(0

.00)

(0.0

0)(0

.01)

(0.0

1)(2

.97)

(0.0

0)(0

.08)

Tru

e:72

.00

0.02

2.00

0.90

0.01

1.50

0.90

84

−1500

−1100

−700

−300n=200

log−

likel

ihoo

d, H

1

−3000

−2000

−1000

n=400

log−

likel

ihoo

d, H

1

−1300

−950

−600

log−

likel

ihoo

d, H

1

−2500

−2100

−1700

−1300

log−

likel

ihoo

d, H

1

−1700

−1400

−1100

log−

likel

ihoo

d, H

1

−3300

−2950

−2600

log−

likel

ihoo

d, H

1

NP C1 AR(1) NP C

1

NP C2 NP C

2

NP C3 NP C

3

AR(1)

AR(1) AR(1)

AR(1) AR(1)

Figure 3-2. Boxplots of the values of the log-likelihood under the alternative model, H1.Significantly biased estimates by ΣAR(1) are apparent for C1.

85

0 100

200300

0 5 10 15

0

0.5

1

|u|

TRUE NONSEPARABLE COVARIANCE

|v|

C1(u

,v)

01

23

0 1 2 3

0

0.5

1

AR(1)

0 100

200300

0 5 10 15

0

0.5

1

|u||v|

C2(u

,v)

01

23

0 1 2 3

0

0.5

1

0 100

200300

0 5 10 15

0

0.5

1

|u||v|

C3(u

,v)

01

23

0 1 2 3

0

0.5

1

Figure 3-3. Covariance plots. Plots of Cl, l = 1, 2, 3 versus irradiance (|u|) andtemperature (|v|) lags are on the left column. On the right column are theestimates of Cl by ΣAR(1).

86

0 100 200 3000

5

10

15

|u|

|v|

TRUE NONSEPARABLE COVARIANCE

0 1 2 30

1

2

3AR(1)

0 100 200 3000

5

10

15

|u|

|v|

0 1 2 30

1

2

3

0 100 200 3000

5

10

15

|u|

|v|

0 1 2 30

1

2

3

C1(u,v)

C2(u,v)

C3(u,v)

Figure 3-4. Contour plots. Contour plots of Cl, l = 1, 2, 3 on the left column. On the rightcolumn are the contour plots of the estimates of Cl by ΣAR(1).

87

Tab

le3-

5.A

vera

ged

QT

Lpo

siti

on,m

ean

curv

epa

ram

eter

s,m

axim

umlo

g-lik

elih

ood

rati

os(m

axL

R),

entr

opy

and

quad

rati

clo

sses

and

thei

rst

anda

rder

rors

(giv

enin

pare

nthe

ses)

for

two

QT

Lge

noty

pes

ina

back

cros

spo

pula

tion

unde

rdi

ffere

ntsa

mpl

esi

zes

(n)

base

don

100

sim

ulat

ion

repl

icat

es(C

1w

ith

n=

400

and

σ2

=2,

4).

QT

LQ

TL

geno

type

1Q

TL

geno

type

2lo

g-lik

elih

ood

Cov

aria

nce

σ2

Loc

atio

nα

1P

m1(2

0)θ 1

α2

Pm

2(2

0)θ 2

H0

H1

max

LR

LE

LQ

ΣA

R(1

)2

72.4

00.

022.

050.

890.

011.

520.

87-5

437

-537

312

8.51

19.4

568

4.37

(0.4

4)(0

.00)

(0.0

1)(0

.01)

(0.0

0)(0

.02)

(0.0

1)(7

.36)

(7.3

1)(2

.45)

(0.0

5)(4

.44)

474

.20

0.02

2.11

0.88

0.01

1.52

0.84

-817

5-8

141

65.5

519

.44

683.

82(0

.69)

(0.0

0)(0

.02)

(0.0

1)(0

.00)

(0.0

3)(0

.02)

(7.3

2)(7

.31)

(1.8

0)(0

.05)

(4.4

6)

C1

271

.96

0.02

2.01

0.90

0.01

1.54

0.88

-408

8-4

021

133.

410.

010.

13(0

.29)

(0.0

0)(0

.01)

(0.0

0)(0

.00)

(0.0

2)(0

.01)

(7.1

7)(7

.16)

(2.1

5)(0

.00)

(0.0

2)4

71.9

60.

022.

030.

890.

011.

570.

86-6

822

-678

869

.07

0.01

0.13

(0.4

4)(0

.00)

(0.0

1)(0

.01)

(0.0

0)(0

.03)

(0.0

2)(7

.16)

(7.1

6)(1

.57)

(0.0

0)(0

.02)

NP

272

.16

0.02

2.01

0.89

0.01

1.54

0.87

-396

7-3

912

109.

790.

531.

05(0

.29)

(0.0

0)(0

.01)

(0.0

0)(0

.00)

(0.0

2)(0

.01)

(6.8

7)(6

.89)

(1.6

6)(0

.00)

(0.0

1)4

71.6

40.

022.

010.

890.

011.

570.

84-6

713

-668

459

.92

0.53

1.04

(0.4

9)(0

.00)

(0.0

1)(0

.01)

(0.0

0)(0

.03)

(0.0

2)(6

.89)

(6.9

3)(1

.27)

(0.0

0)(0

.01)

Tru

e:72

.00

0.02

2.00

0.90

0.01

1.50

0.90

88

Tab

le3-

6.A

vera

ged

QT

Lpo

siti

on,m

ean

curv

epa

ram

eter

s,m

axim

umlo

g-lik

elih

ood

rati

os(m

axL

R),

entr

opy

and

quad

rati

clo

sses

and

thei

rst

anda

rder

rors

(giv

enin

pare

nthe

ses)

for

two

QT

Lge

noty

pes

ina

back

cros

spo

pula

tion

unde

rdi

ffere

ntsa

mpl

esi

zes

(n)

base

don

100

sim

ulat

ion

repl

icat

es(C

1w

ith

n=

400,

incr

ease

dir

radi

ance

and

tem

pera

ture

leve

ls,an

dσ

2=

1,2)

.Q

TL

QT

Lge

noty

pe1

QT

Lge

noty

pe2

log-

likel

ihoo

dC

ovar

ianc

eσ

2Loc

atio

nα

1P

m1(2

0)θ 1

α2

Pm

2(2

0)θ 2

H0

H1

max

LR

LE

LQ

ΣA

R(1

)1

72.1

60.

022.

040.

900.

011.

480.

88-1

278

-106

343

0.01

223

6409

0(0

.36)

(0.0

0)(0

.01)

(0.0

0)(0

.00)

(0.0

1)(0

.01)

(14.

01)

(14.

15)

(4.7

8)(0

.45)

(261

.88)

278

.44

0.02

2.15

0.91

0.01

1.48

0.86

-699

2-6

876

231.

8622

263

923

(0.8

4)(0

.00)

(0.0

2)(0

.00)

(0.0

0)(0

.02)

(0.0

1)(1

4.08

)(1

4.16

)(3

.62)

(0.4

4)(2

57.8

9)

C1

171

.76

0.02

2.01

0.90

0.01

1.51

0.89

4913

5068

309.

860.

010.

31(0

.18)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

1)(0

.00)

(11.

04)

(11.

10)

(3.1

7)(0

.00)

(0.0

4)2

71.7

60.

022.

010.

900.

011.

520.

88-8

21.0

8-7

43.7

615

4.64

0.01

0.31

(0.2

4)(0

.00)

(0.0

1)(0

.00)

(0.0

0)(0

.01)

(0.0

1)(1

1.10

)(1

1.12

)(2

.22)

(0.0

0)(0

.04)

NP

171

.726

0.02

2.01

0.90

0.01

1.51

0.89

5431

5537

212.

642.

344.

55(0

.18)

(0.0

0)(0

.01)

(0.0

0)(0

.00)

(0.0

1)(0

.00)

(11.

22)

(11.

11)

(2.2

0)(0

.01)

(0.0

3)2

72.1

260.

022.

010.

900.

011.

490.

89-3

36-2

7312

7.37

2.37

4.53

(0.3

4)(0

.00)

(0.0

1)(0

.00)

(0.0

0)(0

.01)

(0.0

1)(1

0.44

)(1

0.42

)(1

.72)

(0.0

1)(0

.03)

Tru

e:72

.00

0.02

2.00

0.90

0.01

1.50

0.90

89

3.5 Summary and Discussion

In this chapter, we studied the covariance model in functional mapping of photosynthetic

rate as a reaction norm to irradiance and temperature as the environmental signals. In

the presence of interaction between the two signals simulated by nonseparable covariance

structures, our analysis showed that ΣNP is a more reliable estimator than ΣAR(1). The

advantage of ΣNP over ΣAR(1) is more pronounced when the variance of the reaction norm

process and the number of signal levels increase.

A few issues need to be discussed. First, ΣNP was developed in chapter 2 based

on a sequence of regressions obtained from the modified Cholesky decomposition of the

covariance matrix of a one dimensional (longitudinal) vector which has an ordering of

variables. In this chapter, the phenotype vector consists of observations based on two

levels of irradiance and temperature measurements, i.e.

yi = [yi(1, 1), ..., yi(1, T )︸︷︷︸irradiance 1

, ..., yi(S, 1), ..., yi(S, T )︸︷︷︸irradiance S

]′. (3–32)

While the order of the variables in this vector is predefined, there is no natural ordering

like in longitudinal data. Instead of ΣNP , a more appropriate method might be to adopt

the sparse permutation invariant covariance estimator (SPICE) proposed by Rothman et

al. (2008) which is invariant to variable permutations. SPICE is derived by decomposing

the covariance matrix as

Σ = C ′C (3–33)

where C = [ctj] is a lower triangular matrix. In terms of the components of the sequence of

regression equations,

ctj = −φtj

σtt

, t < j and ctj =1

σtt

(3–34)

where φtj and σ2tt are the GARPs and IVs (Section 2.2.1). However, our simulation

results suggest that ΣNP can still be directly applied to observations that have no

variable ordering such as 3–32. Furthermore, Rothman et al. stated that, under variable

90

permutations, the L1 penalty in Huang eta al.’s (2006) method can still potentially

produce reasonable estimates.

A second issue pertains to using nonseparable models in functional mapping where

the simulations in this chapter showed very good results. This might be a good idea if

the model closely reflects the structure of the data. Unfortunately, this is not often the

case. In fact, it is not even known whether the data exhibits interactions or not. Before

deciding on what model to use, spatio-temporal modelers utilize tests for separability

(Mitchell et al., 2005; Fuentes et al., 2005). If separable models are appropriate, there

are a wealth of options. Otherwise, it is difficult to choose from a number of complex

models because there are no available general guidelines as yet that can help one decide

on a specific nonseparable model. The model C3 that was used in the simulations (Section

3.4) has an easy to interpret interaction parameter γ ∈ [0, 1]. However, despite an

interaction ”strength” of γ = 0.6, the separable model, ΣAR(1), estimated the data

generated by C3 quite well. Thus, the trade-off between using a nonseparable model

instead of a separable one may not be worth it. Another option is to use separable

approximations to nonseparable covariances (Genton, 2007). The nonseparable covariances

that we considered were assumed to be stationary and isotropic (Section 3.3.2). These two

assumptions may not always hold for real data. Although not specifically addressed here,

using ΣNP may work for data that do not satisfy these assumptions.

In this chapter, we only considered two environmental signals with interactions:

irradiance and temperature. However, the reaction norm of photosynthetic rate is a very

complex process because there are really more environmental signals at play other than

the latter two. The spatial domain of spatio-temporal nonseparable covariance models

can be extended to more than one dimensions. For example, a two dimensional spatial

domain models an area on a flat surface while a three dimensional domain models space.

However, this extension cannot be used to increase the number of signals unless the signals

have the same unit of measurement or one assumes separability or no interaction. Thus,

91

it is difficult to simulate data from more than two signals with interactions. However, the

proposed nonparametric estimator can theoretically handle this type of data.

The analysis conducted in this chapter were all based on simulated data, which

makes our proposed model theoretical and not (yet) practical. However, we hope that

our theoretical framework can either stimulate and motivate researchers to conduct

experiments and studies to produce data that our model can analyze or at least lead

research to a direction that we consider theoretically possible.

92

CHAPTER 4CONCLUDING REMARKS

4.0.1 Summary

In this dissertation, we provided a new nonparametric covariance estimator for

functional mapping of complex dynamic traits. The estimator is positive-definite because

it was derived through the modified Cholesky decomposition. It was also regularized by

the application of ridge regression and LASSO techniques to a sequence of regressions

obtained from a statistical interpretation of the components of the modified Cholesky

decomposition. The estimator was obtained by using the ECM algorithm whereby the

posterior probability that each progeny had a particular genotype was utilized as the

weights in the closed form formulas 2–34, 2–35 and 2–36 based on an L2 penalty. This

provided not only an intuitive interpretation but more importantly, a way to extend a

null model covariance to an alternative (or mixture) model covariance. Thus, the nested

LASSO (Levina et al., 2008) and SPICE (Rothman et al., 2008) models can potentially be

implemented by our method to produce other regularized estimators.

We considered two main applications of functional mapping: traits measured

across time or longitudinal traits (chapter 2) and reaction norms to two environmental

signals (chapter 3). For longitudinal traits, simulations showed that our estimator

can more precisely estimate the QTL location and its effects compared to AR(1). For

reaction norms, simulations again showed our estimator to be more reliable than Wu

et al.’s (2007a) proposed estimator in the presence of interaction effects between the

two environmental signals. The interaction effects were simulated using nonseparable

covariance structures. In both applications, the nonparametric estimator is more flexible

because it does not assume a parametric or structural form and is therefore suited to

analyze data with varying structures. Therefore, the nonparametric estimator can be

used as an alternative over, or a guide for, parametric modeling of the covariance in the

practical deployment of functional mapping.

93

4.0.2 Future Directions

Although functional mapping only considers one QTL at a time, a breakthrough

model was developed by Yang et al. (2007). The authors proposed composite functional

mapping which is an integration of functional mapping and composite interval mapping

(Jansen and Stam, 1994; Zeng, 1994; section 1.3). Composite functional mapping allows

modeling of marker effects beyond the interval considered by using a partial regression

analysis. This significantly improves the accuracy and precision of functional mapping

in multiple QTL detection. However, composite functional mapping assumes an AR(1)

covariance structure. It would be advantageous to incorporate our proposed nonparametric

estimator into this method to further improve its power.

The development of complex traits is the consequence of interactions among a

multitude of genetic and environmental factors that each trigger an impact on every step

of trait development. This process is inherently complicated, but can be illustrated by a

landscape of phenotype formed by genetic and environmental variables (Rice 2002; Wolf

2002). The surface of such a phenotype landscape defines the phenotype determined by

a particular combination of underlying genetic (such as additive, dominant or epistatic)

and environmental factors (such as temperature, light or moisture) that interact with each

other through developmental pathways. The number of underlying factors contributing

to phenotypic variation defines the number of dimensions of the landscape space. In

theory, the number of underlying factors can be unlimited, implying that a landscape can

exist in very-high-dimensional space (i.e., hyperspace) (Wolf 2002). Figure 4-1 shows a

hypothetical landscape, where the phenotype of an individual is determined by the values

of two underlying factors. By characterizing the topographical features of such landscape,

a fundamental question of how each underlying factor contributes to the expression of

a particular trait individually or through an interactive web can be addressed. These

features typically include “slope”, “curvature”, “peak-valley” and “ridge”. The description

of the topography of a three-dimensional landscape (Fig. 4-1) is most intuitive, but

94

the same descriptors can also be applied to hyperdimensional landscapes, although

the intuitive interpretation of the features become increasingly abstract with increased

dimensionality (Wolf 2002). It is worthwhile to develop a model for testing whether these

landscapes are the same for different genetic machineries involved to regulate phenotypic

responsiveness to multiple different environments. For example, we do not know whether

there are the same genetic system to regulate the reaction norms of a plant to photoperiod

and temperature. If there are different genetic systems, how do different genetic elements

interact with each other in a complicated web to determine the outcome of reaction

norms?

Figure 4-1. Formation of a phenotype by a landscape. The phenotypic formation is afunction of the value of underlying factors 1 and 2 (u1 and u2) that interactduring trait development. Two shaded ovals present two different areas on thesurface, one being steeper (pointing to Inset A) and the second being flatter(pointing to Inset B). The steeper one is associated with a dramatic change inphenotypic expression contributed by a small change in the underlying factors(indicated by the distribution in Inset A), whereas the flatter one associatedwith a different pattern in which dramatic changes in the underlying factorsonly lead to a minor change in phenotypic expression (indicated by thedistribution in Inset B). Adapted from Wolf (2002).

Because biological traits are derived from developmental processes and physiological

regulatory mechanisms, complex multivariate systems that undergo such processes should

95

be carefully studied. Considering a complex interplay among numerous interacting

genetic loci, developmental processes, and environmental aspects of trait expression, we

need to develop a systems approach that integrates the analytic and synthetic method,

encompassing both holism and reductionism. Such a systems approach enables the study

of all elements in a network in response to developmental or environmental perturbations.

Synthetic analyses of biological information from different elements through mathematical

modeling provide new insights into the operation of the system. For example, to better

study the genetics of seed production in the common bean, one should dissolve this system

into its elements, seed size and seed number, and study the developmental trajectories

of these elements and their developmental interactions during ontogeny. We will need to

integrate a systems approach to study the genetic etiology of development through the

connection of its three fundamental aspects, allometry, ontogeny and plasticity.

96

APPENDIX ADERIVATION OF EM ALGORITHM FORMULAS

Let x = (x′1, ...,x′n)′, where xi = (xi1, ..., xiJ)′, i = 1, ..., n, is a vector that indicates

from which genotype group yi = (yi1, ..., yim)′ belongs to. We assume that the xi’s are

independent and identically distributed (i.i.d.) realized values from a multinomial(1,pi)

distribution where pi = (p1|i, ..., pJ |i)′. Thus, xik = 1 or 0, depending on whether or not

yi belongs to genotype group k = 1, ..., J . In reality, x is unknown (or missing) so that

y = (y′1, ...,y′n)′ can be viewed as incomplete data. The complete data is (x′,y′)′ with

log-likelihood

log Lc(Ω) = logn∏

i=1

J∏

k=1

[pk|ifk(yi|Ω)]xik

=n∑

i=1

J∑

k=1

xik[log pk|i + log fk(yi|Ω)]. (A–1)

The EM algorithm at the (j + 1)th iteration proceeds as follows:

1. The current value of Ω is Ω(j).

2. E-Step. Calculate the conditional expectation of the complete data log-likelihood,

(1-11), given the observed data y and Ω(j):

Q(Ω|Ω(j)) = E[log Lc(Ω)|y,Ω(j)]

=n∑

i=1

J∑

k=1

E(Xik|yi,Ω(j))[log pk|i + log fk(yi|Ω)]

=n∑

i=1

J∑

k=1

P(Xik = 1|yi,Ω(j))[log pk|i + log fk(yi|Ω)]. (A–2)

97

APPENDIX BDERIVATION OF EQUATION 2-9

Suppose X ′X is in correlation form. Then the eigenvalue decomposition of X ′X is

V ′(X ′X)V =

λ1 0 0 0

0 λ2 0 0

......

. . ....

0 0 0 λk

where λi’s are the eigenvalues of X ′X and the elements of the orthogonal matrix V =

(v1, ..., vk) are the associated eigenvectors (Myers, 1990). Here, vi = (vi1, ..., vik)′. Thus,

(X ′X)−1 = V

1/λ1 0 0 0

0 1/λ2 0 0

......

. . ....

0 0 0 1/λk

V ′

since V is orthogonal. Eq. 2–9 then follows.

99

APPENDIX CMINIMIZATION OF 2-33

For fixed φtj, j = 1, 2, ..., t− 1, 2–33 is minimized with respect to σ2t by solving

∂

∂σ2t

[n∑

i=1

J∑

k=1

Pk|i

(log σ2

t +εkit

2

σ2t

)]=

n∑i=1

J∑

k=1

Pk|i

(1

σ2t

− εkit

2

σ4t

)= 0

yielding

σ2t =

∑ni=1

∑Jk=1 Pk|i

(yk

it −∑t−1

j=1 ykijφtj

)2

n(C–1)

since εkit = yk

it −∑t−1

j=1 ykijφtj.

For fixed σ2t , 2–33 is minimized with respect to φtj by the minimizer of

n∑i=1

J∑

k=1

Pk|i(yk

it −∑t−1

j=1 ykijφtj

)2

σ2t

+ λ

t−1∑j=1

φ2tj. (C–2)

Let φt(t) = (φt1, φt2, ..., φt,t−1)′ and yk

i(t) = (yki1, y

ki2, ..., y

ki,t−1)

′. The first term of C–2 is

1

σ2t

n∑i=1

J∑

k=1

Pk|i

(yk

it −t−1∑j=1

ykijφtj

)2

=1

σ2t

n∑i=1

J∑

k=1

Pk|i(yk

it − yki(t)

′φt(t)

)2

=1

σ2t

n∑i=1

J∑

k=1

Pk|i(yk

it

2 − 2yki(t)y

ki(t)

′φt(t) + φt(t)

′yki(t)y

ki(t)

′φt(t)

)

=n∑

i=1

J∑

k=1

Pk|iykit

2

σ2t

− 2

(n∑

i=1

J∑

k=1

Pk|iyki(t)y

ki(t)

′

σ2t

)φt(t) +

φt(t)′(

n∑i=1

J∑

k=1

Pk|iyki(t)y

ki(t)

′

σ2t

)φt(t)

= ct − 2g′tφt(t) + φt(t)′Htφt(t)

where

ct =n∑

i=1

J∑

k=1

Pk|iykit

2

σ2t

, Ht =n∑

i=1

J∑

k=1

Pk|iyki(t)y

ki(t)

′

σ2t

, and gt =n∑

i=1

J∑

k=1

Pk|iyki(t)y

ki(t)

′

σ2t

.

100

If It is a (t− 1)× (t− 1) identity matrix, then C–2 becomes

n∑i=1

J∑

k=1

Pk|i(yk

it −∑t−1

j=1 ykijφtj

)2

σ2t

+ λ

t−1∑j=1

φ2tj = ct − 2g′tφt(t) + φ′t(t)Htφt(t) + λφ′t(t)Itφt(t)

= ct − 2g′tφt(t) + φ′t(t)(Ht + λIt)φt(t)

which can be minimized by

φt(t) = (Ht + λIt)−1gt (C–3)

for fixed σ2t .

101

APPENDIX DDEFINITION OF KRONECKER PRODUCT

If

Am×n =

a11 a12 · · · a1n

a21 a22 · · · a2n

......

. . ....

am1 am2 · · · amn

and Bp×q =

b11 b12 · · · b1q

b21 b22 · · · b2q

......

. . ....

bp1 bm2 · · · bmq

then the kronecker product of A and B is

A⊗B =

a11B a12B · · · a1nB

a21B a22B · · · a2nB

......

. . ....

am1B am2B · · · amnB

.

102

APPENDIX EDERIVATION OF EQUATION 3-20

By Fourier transformation,

g(ω, τ) =

(1

2π

)2 ∫ ∫e−i(uω+vτ)C(u, v)dudv

=1

2π

∫e−ivτ

[1

2π

∫e−iuωC(u, v)du

]dv

=1

2π

∫e−ivτh(ω, v)dv (E–1)

where

h(ω, v) =

∫ ∞

−∞eivτg(ω, τ)dτ (E–2)

is the inverse Fourier transform of g in τ or the spatial spectral density for temporal lag τ .

Using E–2, 3–19 becomes

C(u, v) =

∫eiuωh(ω, v)dω. (E–3)

Let

h(ω, v) = ρ(ω, v)κ(ω) (E–4)

where ρ(ω, v) is a valid continuous autocorrelation function in v for each ω and κ(ω) > 0.

If∫

ρ(ω, v)dv < ∞ and∫

k(ω)dω < ∞, then in terms of E–4, E–3 becomes 3–20.

103

REFERENCES

[1] Andersson, L., Haley, C.S., Ellegren, H., Knott, S.A., Johansson, M., Andersson,K., Anderssoneklund, L., Edforslilja, I., Fredholm, M., Hansson, I., Hakansson, J.,Hakansson, J. and Lundstrom, K. (1994). “Genetic mapping of quantitative trait locifor growth and fatness in pigs”, Science 263 1771-1774.

[2] Angilletta, Jr., M.J. and Sears, M.W. (2004). “Evolution of thermal reaction normsfor growth rate and body size in ectotherms: an introduction to the symposium”,Integr. Comp. Biol. 44, 401-402.

[3] Akaike, H. (1974) “A new look at the statistical model identification”, IEEETransactions on Automatic Control 19(6): 716723.

[4] Banerjee, O., ’dAspremont, A., and El Ghaouli, L. (2006). “Sparse covarianceselection via robust maximum likelihood estimation”, Proceedings of ICML.

[5] Bickel, P. and Levina, E. (2008) “Regularized estimation of large covariancematrices”, Ann. Statist. 36(1):199-227.

[6] Bochner, S. (1955). Harmonic Analysis and the Theory of Probability, University ofCalifornia Press, Berkley and Los Angeles.

[7] Broman, K. (1997) Identifying quantitative trait loci in experimental crosses, Ph.D.Dissertation, Department of Statistics, University of California, Berkley.

[8] Broman, K. (2001). “Review of statistical methods for QTL mapping inexperimental crosses”, Lab Animal 30, no. 7, 44-52.

[9] Carlborg, O., Andersson, L. and Kinghorn, B. (2000). “The use of a geneticalgorithm for simultaneous mapping of multiple interacting quantitative trait loci”,Genetics 155, 2003-2010.

[10] Carrol, R.J. and Rupert, D. (1984). “Power transformations when fitting theoreticalmodels to data”, J. Am. Statist. Assoc. 79, 321-328.

[11] Cox, D.D. and Sullivan, F. (1990). “Asymptotic analysis of penalized likelihood andrelated estimators”, Annals of Statistics 18, 1676-1695.

[12] Cressie, N. and Huang, H-C. (1999). “Classes of nonseparable, spatio-temporalstationarycovariance functions”, J. Am. Statist. Assoc 94, no. 448, 1330-1340.

[13] Cui, H.J., Zhu, J. and Wu, R. (2006) “Functional mapping for genetic control ofprogrammed cell death”, Physiol. Genom. 25, 458-469.

[14] Daniels, M.J. and Pourahmadi, M. (2002). “Bayesian analysis of covariance matricesand dynamic models for longitudinal data”, Biometrika 89, 553-566.

[15] de Boor, C. (2001) “A Practical Guide to Splines”, Revised ed. Springer New York.

104

[16] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). “Maximum likelihood fromincomplete data via the EM algorithm”, J. Roy. Statist. Soc. B 39, 1-38.

[17] Diggle, P.J., Heagerty, P., Liang, K.Y. and Zeger, S.L. (2002). Analysis of Longitudi-nal Data, Oxford University Press, UK.

[18] Doerge, R.W. (2002). “Mapping and analysis of quantitative trait loci inexperimental populations”, Nat. Rev. Genet. 3: 43-52.

[19] Doerge, R.W. and Churchill, G.A. (1996). “Permutation tests for multiple lociaffecting a quantitative character”, Genetics 142, 285-294.

[20] Drayne, D., Davies, K., Hartley, D., Mandel, J.L., Camerino, G., Williamson, R.and White, R. (1984). “Genetic mapping of the human X-chromosome by usingrestriction fragment length polymorphisms”, Proc. Natl. Acad. Sci. USA 812836-2839.

[21] Eilers, P.H.C. and Marx, B.D. (1996) “Flexible smoothing with B-splines andpenalties”, Statistical Science 11, no. 2, 89-121.

[22] Fan, J. and Li, R. (2001). “Variable selection via nonconcave penalized likelihoodand its oracle properties”, J. Am. Statist. Assoc. 96, 1348-1360.

[23] Fu, W. (1998). “Penalized regressions: The bridge versus the lasso”, Comput. Graph.Statist. 7, 397-416.

[24] Fuentes, M. (2005). “Testing separability of spatial-temporal covariance functions”,Journal of Statistical Planning and Inference 136, no. 2, 447-466.

[25] Furrer, R. and Bengtsson, T. (2007). “Estimation of high-dimensional prior andposteriori covariance matrices in Kalman filter variants”, Journal of MultivariateAnalysis 98, no. 2, pp. 227-255.

[26] Genton, M. (2007). “Separable approximations of space-time covariance matrices”,Envirometrics 18, 681-695.

[27] Gill, P., Murray, W. and Wright, M. (1981). Practical Otimization, Academic Press,New York.

[28] Gneiting, T. (2002). “Nonseparable, stationary covarience functions for space-timedata”, J. Am. Statist. Assoc 97, no. 458, 590-600.

[29] Gneiting, T., Genton, M. and Guttorp, P. (2006). “Geostatistical space-time models,stationary, separability and full symmetry”, Statistical Methods for Spatio-temporalSystems (Monographs on Statistics and Applied Probability) B. Finkenstadt, L. Heldand V. Isham, editors, Chapman & Hall/CRC.

[30] Green, P. (1990). “On use of the EM algorithm for penalized likelihood estimation”,J. Roy. Statist. Soc. B 52, 443-452.

105

[31] Green, P. (1999). “Penalized likelihood”, Encyclopedia of Statistical Sciences 3,578-586.

[32] Griffiths, A.J., Wessler, S.R., Lewontin, R.C., Gelbart, W.G., Suzuki, D.T. andMiller, J.H. (2005). Introduction to Genetic Analysis, W.H. Freeman and Company,New York.

[33] Haldane, J. B. S. (1919). “The combination of linkage values and the calculation ofdistance between the loci of linked factors”, Journal of Genetics 8, 299-309.

[34] Haley, C.S., Knott, S.A. and Elsen, J.M. (1994). “Genetic mapping of quantitativetrait loci in cross between outbred lines using least squares”, Genetics 1361195-1207.

[35] Hoerl, A. and Kennard, R. (1970). “Ridge regression: biased estimation fornonorthogonal problems”, Technometrics 12, 55-67.

[36] Hoeschele, I. (2000). Mapping quantitative trait loci in outbred pedigrees. In:Handbook of Statistical Genetics Edited by D. J. Balding, M. Bishop and C.Cannings. Wiley New York. 567-597.

[37] Huang, H.C., Martinez, F., Mateu, J. and Montes, F. (2007a). “Model comparisonand selection for stationary space-time models”, Comp. Statistics and Data Analysis51, 4577-4596.

[38] Huang, J., Liu, L. and Liu, N. (2007b). “Estimation of large covariance matrices oflongitudinal data with basis function approximations”, J. Comput. Graph. Statist.16, 189-209.

[39] Huang, J., Liu, N., Pourahmadi, M. and Liu, L. (2006). “Covariance selection andestimation via penalised normal likelihood”, Biometrika 93, 85-98.

[40] Ibanez, M.V. and Simo, A. (2007). “Spatio-temporal modeling of perimetric testdata”, Statistical Methods in Medical Research 16, no. 6, 497-522.

[41] Jansen, R.C. (2000). Quantitative trait loci in inbred lines. In: Handbook ofStatistical Genetics Edited by D. J. Balding, M. Bishop and C. Cannings. WileyNew York. 567-597.

[42] Jansen, R.C. and Stam, P. (1994). “High resolution of quantitative traits intomultiple loci via interval mapping”, Genetics 136, 1447-1455.

[43] Jones, R.H. and Zhang, Y. (1997). “Models for continuous stationary space-timeprocess”, In Modelling Longitudinal and Spatially Correlated Data, Lecture Notes inStatistics 122, Springer, New York, 122, 289-298.

[44] Kao, C.H., Zeng, Z-B. and Teasdale, R.D. (1999). “Multiple interval mapping forquantitative trait loci”, Genetics 152, 1203-1216.

106

[45] Knott, S.A., Neale, D.B., Sewell, M.M. and Haley, C.S. (1997). “Multiple markermapping of quantitative trait loci in an outbred pedigree of loblolly pine”, Theor.Appl. Genet. 94 810-820.

[46] Kramer, M.G., Vaughn, T.T., Pletscher, L.S., King-Ellison, K. Erikson, C. andCheverud, J.M. (1998). “Genetic variation in body weight growth and compositionin the intercross of large (LG/J) and small (SM/J) inbred strains of mice”, Geneticsand Molecular Biology 21, 211-218.

[47] Kenward, M.G. (1987). “A method for comparing profiles of repeatedmeasurements”, Appl. Statist 36, 296-308.

[48] Kingsolver, J.G., Izem, R. and Ragland, G.J. (2004). “Plasticity of size and growthin fluctuating thermal environments: comparing reaction norms and performancecurves”, Integr. Comp. Biol. 44, 450-460.

[49] M. Kirkpatrick and N. Heckman, “A quantitative genetic model for growth, shape,reaction norms, and other infinite-dimensional characters”, J. Math. Biol. 27,429-450, 1989.

[50] Kolovos, A., Christakos, G., Hristopulos, D.T. and Serre, M.L. (2004). “Methodsfor generating non-separable spatiotemporal covariance models with potentialenvironmental applications”, Advances in Water Resources 27, 815-830.

[51] Krishnaiah, P. (1985). “Multivariate Analysis”, Elsevier Science Publishers B.V.,New York.

[52] Lander, E.S. and Botstein, D. (1989). “Mapping Mendelian factors underlyingquantitative traits using RFLP linkage maps”, Genetics 121, 185-199.

[53] Ledoit, O. and Wolf, M. (2003). “A well-conditioned estimator for large-dimensionalcovariance matrices”, Journal of Multivariate Analysis 88, 365-411.

[54] Levina, E., Rothman, A. and Zhu, J. (2008). “Sparse estimation of large covariancematrices via a nested lasso penalty”, Ann. Appl. Statist. 2, no. 1, 245-263.

[55] Li, H.Y., Kim, B-R. and Wu, R.L. (2006). “Identification of quantitative traitnucleotides that regulate cancer growth: A simulation approach”, J. Theor. Biol.242, 426-439.

[56] Li, Y., Wang, N., Hong, M., Turner, N., Lupton, J. and Carrol, R., (2007).“Nonparametric estimation of correlation functions in longitudinal and specialdata, with applications to colon carcinogenesis experiments”, Annals of Statistics35, no. 4, 1608-1643.

[57] Lin, M., Li, H.Y., Hou, W., Johnson, J.A. and Wu, R.L. (2007). “Modelingsequence-sequence interactions for drug response”, Bioinformatics 23, no. 10,1251-1257.

107

[58] Lin, M., Lou, X-Y., Chang, M. and Wu, R.L. (2003). “A general statisticalframework for mapping quantitative trait loci in non-model systems: Issue forcharacterizing linkage phases”, Genetics 165, 901-913.

[59] Lindley, D.V. (1957). “A statistical paradox”, Biometrika 44, 187-192.

[60] Liu, T., Liu, X-L, Chen, Y.M. and Wu, R.L. (2007). “A unifying differentialequation model for functional genetic mapping of circadian rhythms”, Theor. Biol.Medical Model. 4, 5.

[61] Liu, T. and Wu, R.L. (2007). “A general Bayesian framework for functional mappingof dynamic complex traits”, Genetics (tentatively accepted 2007).

[62] Liu, T., Zhao, W., Tian, L. and Wu, R.L. (2005). “An algorithm for moleculardissection of tumor progression”, J. Math. Biol. 50, 336-354.

[63] Long, F., Chen, Y.Q., Cheverud, J.M. and Wu, R.L. (2006). “Genetic mapping ofallometric scaling laws”, Genet. Res. 87, 207-216.

[64] Lynch, M. and Walsh, B. (1998). Genetics and Analysis of Quantitative Traits.Sinauer, Sunderland, MA.

[65] Ma, C. (2007). “Recent developments on the construction of spatial-temporalcovariance models”, Stoch Environ Res Risk Assess, Springer-Verlag, 22, s39-s47.

[66] Ma, C., Casella, G. and Wu, R.L. (2002). “Functional mapping of quantitative traitloci underlying the character process: A theoretical framework”, Genetics 161,1751-1762.

[67] Madsen, H. and Thyregod, P. (2001). “Calibration with absolute shrinkage”, J.Chemomet. 15, 497-509.

[68] Mallows, C.L. (1973). “Some Comments on Cp”, Technometrics, 15, 661-675.

[69] Matern, B. (1986). Spatial Variation, Springer New York, 2nd Ed..

[70] McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models, Chapman andHall, London.

[71] McLachlan, G. and Peel, D. (2000). Finite Mixture Models, John Wiley and Sons,Inc., New York.

[72] Meng, X-L. and Rubin, D. (1993). “Maximum likelihood estimation via the ECMalgorithm: A general framework”, Biometrika 80, 267-278.

[73] Mitchell, M.W., Genton, M.G. and Gumpertz, M.L. (2005) “Testing for separabilityof space-time covariences”, Envirometrics 16, 819-831.

[74] Molenberghs, G. and Verbeke, G. (2005). Models for Discrete Longitudinal Data,Springer Science+Business Media, Inc., New York.

108

[75] Myers, R. (1990). Classical and Modern Regression with Applications, PWS-KentPublishing Company, Boston.

[76] Nelder, J.A. and Mead, R. (1965). “A simplex method for function minimization”,Comput. J. 7, 308-313.

[77] Newton, H.J. (1988). TIMESLAB: A Time Series Analysis Laboratory, Wadsworth& Brooks/Cole, Pacific Grove, CA.

[78] Niklas, K.L. (1994). Plant Allometry: The Scaling of Form and Process, Universityof Chicago, Chicago.

[79] Nychka, D., Wikle, C. and Royle, A. (2002). “Multiresolution models fornonstationary spatial covariance functions”, Statistical Modeling 2, 315-331.

[80] Ojelund, H., Madsen, H. and Thyregod, P. (2001). “Calibration with absoluteshrinkage”, J. Chemomet 15, 497-509.

[81] Pan, J.X. and Mackenzie, G. (2003). “On modelling mean-covariance in longitudinalstudies”, Biometrika 90, 239-244.

[82] Porcu, E. and Mateu, J. (2006) “Nonseparable stationary anisotropic space-timecovariance functions”, Stoch Environ Res. Risk Assess 21, 113-122.

[83] Porcu, E., Mateu, J., Zini, A. and Pini, R. (2007). “Modelling spatio-temporal data:A new viogram and covariance structure proposal”, Statistics and Probability Letters77, 83-89.

[84] Pourahmadi, M. (1999). “Joint mean-covariance models with applications tolongitudinal data: Unconstrained parameterization”, Biometrika 86, 677-690.

[85] Pourahmadi, M. (2000). “Maximum likelihood estimation of generalised linearmodels for multivariate normal covariance matrix”, Biometrika 87, 425-435.

[86] Ramsay, J.O. and Silverman, B.W. (1997). Functional Data Analysis ,Springer-Verlag, New York.

[87] Rothman, A., Bickel, P., Levina, E. and Zhu, J. (2007). “Sparse permutationinvariant covariance estimation”, Dept. of Statistics, Univ. of Michigan (TechnicalReport no. 467).

[88] Sampson. P. and Guttorp, P. (1992). “Nonparametric estimation of nonstationaryspatial covariance structure”, J. Am. Statist. Assoc 87, 108-119.

[89] Satagopan, J.M., Yandell, Y.S., Newton, M.A. and Osborn, T.C. (1996). “ABayesian approach to detect quantitative trait loci using Markov chain MonteCarlo”, Genetics 144, 805-816.

[90] Sax, K. (1923). “The association of size difference with seed-coat pattern andpigmentation in Phaseolus vulgaris”, Genetics 8 552-560.

109

[91] Schabenberger, O. and Gotway, C. (2005). Statistical Methods for Spatial DataAnalysis, Chapman and Hall/CRC, Boca Raton.

[92] Schwarz, G. (1978). “Estimating the dimension of a model”, Annals of Statistics6(2):461-464.

[93] Sillanpaa, M.J. and Arjas, E. (1999). “Bayesian mapping of multiple quantitativetrait loci from incomplete outbred offspring data”, Genetics 151, 1605-1619.

[94] Smith, M. and Kohn, R. (2002). “Parsimonious covariance matrix estimation forlongitudinal data”, J. Am. Statist. Assoc 97, no. 460, 1141-1153.

[95] Stein, M. (2005). “Space-time covariance functions”, J. Am. Statist. Assoc 100,no.469, 310-321.

[96] Stratton, D. (1998). “Reaction norm functions and QTL-environment interactionsfor flowering time in Arabidopsis thaliana”, Heredity 81, 144-155.

[97] Stroud, J. (2001). “Dynamic models for spatiotemporal data”, J. R. Statist. Soc. B63, 673-698.

[98] Tibshirani, R. (1996). “Regression shrinkage and selection via the Lasso”, J. Roy.Statist. Soc. B 58, 267-288.

[99] Vaughn, T., Pletscher, S., Peripato, A., King-Ellison, K., Adams, E., Erikson, C. andCheverud, J. (1999). “Mapping of quantitative trait loci for murine growth: A closerlook at genetic architecture”, Genet. Res. 74, 313-322.

[100] Thornley, J.H.M. and Johnson, I.R. (1990). Plant and Crop Modelling: A Mathemat-ical Approach to Plant and Crop Physiology, Clarendon Press, Oxford.

[101] Waller, L. and Gotway, C. (2004). Applied Spatial Statistics for Public Health Data,Wiley-Interscience, Hoboken, N.J..

[102] Wang, Z., Hou, W. and Wu, R.L. (2006). “A statistical model to analysequantitative trait locus interactions for HIV dynamics from the virus and humangenomes”, Statist. Med 25, 495-511.

[103] Wang, Z. and Wu, R.L. (2004). “A statistical model for high-resolution mapping ofquantitative trait loci determining HIV dynamics”, Statist. Med 23, 3033-3051.

[104] Weiss, R. (2005). Modeling Longitudinal Data, Springer-Verlag, New York.

[107] West, G.B., Brown, J.H. and Enquist, B.J. (2001). “A general model for ontogeneticgrowth”, Nature 413, 628-631.

[106] Wolf, J.B. (2002). “The geometry of phenotypic evolution in developmentalhyperspace”, Proceedings of the National Academy of Sciences of the USA 99,15849-15851.

110

[107] Wong, F., Carter, C.K. and Kohn, R. (2003) “Efficient estimation of covarianceselection models”, Biometrika 90, 809-830.

[108] Wu, J., Zeng, Y., Huang, J., Hou, W., Zhu, J. and Wu, R.L. (2007a). “Functionalmapping of reaction norms to multiple environmental signals”, Genet. Res. Camb.89, 27-38.

[109] Wu, R.L., Hou, W., Cui, Y., Li, H.Y., Wu, S., Ma, C-X. and Zeng, Y. (2007b)“Modeling the genetic architecture of complex traits with molecular markers”,Recent Patents on Nanotechnology 1, 41-49.

[110] Wu, R.L., Ma, C-X., and Casella, G. (2007c). Statistical Genetics of QuantitativeTraits: Linkage, Maps, and QTL, Springer-Verlag, New York.

[111] Wu, R.L., Ma, C-X., Lin, M. and Casella, G. (2004a). “A general framework foranalyzing the genetic architecture of developmental characteristics”, Genetics 166,1541-1551.

[112] Wu, R.L., Ma, C-X., Lin, M., Wang, Z. and Casella, G. (2004b). “Functionalmapping of quantitative trait loci underlying growth trajectories using atransform-both-sides logistic model”, Biometrics 60, 729-738.

[113] Wu, R.L., Ma, C-X., Littel, R. and Casella, G. (2002). “A statistical model for thegenetic origin of allometric scaling laws in biology”, J. Theor. Biol. 217, 275-287.

[114] Wu, W.B. and Pourahmadi, M. (2003). “Nonparametric estimation of largecovariance matrices of longitudinal data”, Biometrika 90, 831-844.

[115] Wu, S., Yang, J. and Wu, R.L. (2007d). “Semiparametric functional mapping ofquantitative trait loci governing long-term HIV dynamics”, Bioinformatics 23,569-576.

[116] Xu, S.Z. (1996). “Mapping quantitative trait loci using four-way crosses”, Genet.Res. 68 175-181.

[117] Xu, S.Z. and Yi, N.J. (2000). “Mixed model analysis of quantitative trait loci”,Proc. Natl. Acad. Sci. USA 97, 14542-14547.

[118] Yang, J. (2006) Nonparametric functional mapping of quantitative trait loci, Ph.D.Dissertation, Department of Statistics, University of Florida.

[119] Yang, R.Q., Gao, H.J., Wang, X., Zheng, Z-B., and Wu, R. L. (2007). “Asemiparametric model for composite functional mapping of dynamic quantitativetraits”, Genetics 177, 1859-1870.

[120] Yap, J.S., Wang, C.G. and Wu, R.L. (2007). “A simulation approach for functionalmapping of quantitative trait loci that regulate thermal performance curves”, PLoSONE 2(6), e554.

111

[121] Yuan, M. and Lin, Y. (2007). “Model selection and estimation in the Gaussiangraphical model”, Biometrika 94(1), 19-35.

[122] Zeng, Z-B. (1994). “Precision mapping of quantitative trait loci”, Genetics 136,1457-1468.

[123] Zhao, W. (2005a). Statistical modelling for functional mapping of longitudinaland multiple longitudinal traits: structured antedependence model and waveletdimensionality reduction, Ph.D. Dissertation, Department of Statistics, University ofFlorida.

[124] Zhao, W., Chen, Y., Casella, G., Cheverud, J.M. and Wu, R.L. (2005b). “Anon-stationary model for functional mapping of complex traits”, Bioinformatics 21,2469-2477.

[125] Zhao, W., Ma, C-X., Cheverud, J.M. and Wu, R.L. (2004). “A unifying statisticalmodel for QTL mapping of genotype × sex interaction for developmentaltrajectories”, Physiol. Genomics 19, 218-227.

[126] Zimmerman, D. and Nunez-Anton, V. (2001). “Parametric modeling of growth curvedata: An overview (with discussions)”, Test 10, 1-73.

112

BIOGRAPHICAL SKETCH

John Stephen F. Yap was born in the town of Tagoloan, Misamis Oriental, Philippines

to Rhoda and Lizardo Yap. He has an older brother, Mark. John earned a B.S. in

mathematics from Ateneo de Manila University in the Philippines and upon graduation

worked as an actuarial assistant at Watson Wyatt. He also spent a year as an assistant

instructor in the Mathematics Department of Ateneo de Manila University. John

obtained a M.S. in mathematics with emphasis in actuarial science from the University

of Minnesota in Minneapolis and a Ph.D. in statistics from the University of Florida

in Gainesville. He will work at the Food and Drug Administration as a mathematical

statistician.

113

nonparametric covariance estimation in functional mapping...

Documents