june 2002ncsu qtl ii © brian s. yandell1 efficient and robust model selection for quantitative...

193
June 2002 NCSU QTL II © Brian S. Ya ndell 1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University of Wisconsin-Madison www.stat.wisc.edu/~yandell with Jaya M. Satagopan, Sloan-Kettering Biostatistics, Fei Zou, UNC Biostatistics and Patrick J. Gaffney, Lubrizol NCSU Statistical Genetics, June 2002

Upload: theodore-wells

Post on 17-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 1

Efficient and Robust Model Selectionfor Quantitative Trait Loci Analysis

in Inbred Lines

Brian S. YandellUniversity of Wisconsin-Madison

www.stat.wisc.edu/~yandell

with Jaya M. Satagopan, Sloan-Kettering Biostatistics,

Fei Zou, UNC Biostatistics

and Patrick J. Gaffney, Lubrizol

NCSU Statistical Genetics, June 2002

Page 2: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 2

Goals• model selection with one QTL

– review interval mapping basics– extensions of phenotype model

• how to map non-normal data?

– brief digression to multiple crosses– Bayesian interval mapping

• how to sample from the posterior?

• model selection over multiple QTL– how many QTL are supported by data?– what is the genetic architecture?– how to sample complicated model space?

Page 3: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 3

Main Topics

away from normality• fewer assumptions

– semi-parametric

– non-parametric

• extended phenotypes• check robustness• multiple crosses

how many QTL?• inferring the number• sampling all QT loci• Bayesian analysis• MCMC methodology• estimating heritability• model selection

Page 4: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 4

Overview• I: IM basics (review)• II: Beyond normal data• III: Bayesian idea

– Bayes theorem

– posterior & likelihood

• IV: MCMC samples– Monte Carlo idea

– study posterior

• V: Multiple QTL

• VI: How many QTL?– Reversible Jump

– analog to regression

• VII: RJ-MCMC details• VII: Model assessment• IX: References

– Software

– Articles

• X: Multiple crosses

Page 5: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 5

Part I: Interval Mapping Basics• observed measurements

– Y = phenotypic trait– X = markers & linkage map

• i = individual index 1,…,n• missing data

– missing marker data– Q = QT genotypes

• alleles QQ, Qq, or qq at locus• unknown quantities

= QT locus (or loci) = phenotype model parameters

• pr(Q|X,) recombination model– grounded by linkage map, experimental cross– recombination yields multinomial for Q given X

• pr(Y|Q,) phenotype model– distribution shape (could be assumed normal) – unknown parameters (could be non-parametric)

observed X Y

missing Q

unknown after

Sen Churchill (2001)

Page 6: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 6

2r

1X 2X 3X 4X 5X 6X

5r4r3r1r

recombination model components

distance along chromosome

Q?

recombination rates

markers

Page 7: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 7

Recombination and Distance• assume map and marker distances are known• useful approximation for QTL linkage

– Haldane map function: no crossover interference– independence implies crossover events are Poisson

• all computations consistent in approximation– rely on given map with known marker locations– 1-to-1 relation of distance to recombination– all map functions are approximate anyway

)21log(

1

21

221

r

er

0.0 0.5 1.0 1.5 2.0

0.0

0.2

0.4

distance

r

rate

HaldaneKosambiMorgan

Page 8: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 8

recombination model pr(Q|X,) • locus is distance along linkage map

– identifies flanking marker region

• flanking markers provide good approximation– map assumed known from earlier study– inaccuracy slight using only flanking markers

• extend to next flanking markers if missing data

– could consider more complicated relationship• but little change in results

pr(Q|X,) = pr(geno | map, locus) pr(geno | flanking markers, locus)

kX 1kXQ?

Page 9: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 9

idealized phenotype model• trait = mean + additive + error• trait = effect_of_geno + error• pr( trait | geno, effects )

),normal(

),|(pr

2

Q

Q

G

QY

EGY

8 9 10 11 12 13 14 15 16 17 18 19 20

qq QQ

Qq

Page 10: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 10

Simulated Data with 1 QTL6

810

1214

x=-1 x=1

68

1012

14

0 2 4 6 8frequency

trai

t

Page 11: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 11

Profile LOD for 1 QTL0

51

01

52

02

5

distance (cM)

LOD

0 10 20 30 40 50 60 70 80 90

QTLIM

Page 12: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 12

Interval Mapping Likelihood

• likelihood: basis for scanning the genome– product over i = 1,…,n individuals

L( ,|Y) = producti pr(Yi|Xi,)

= producti sumQ pr(Q|Xi,) pr(Yi|Q,)– complicated procedure to get estimates

• null likelihood: no QTLL( |Y) = producti pr(Yi|) = producti N(Yi|,2)

– usual mean and variance estimates

Page 13: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 13

Interval Mapping forQuantitative Trait Loci

• profile likelihood (LOD) across QTL– scan whole genome locus by locus

• use flanking markers for interval mapping

– maximize likelihood ratio (LOD) at locus• best estimates of effects for each locus• EM method (Lander & Botstein 1989)

)~

|(pr

)|(pr )ˆ,|(pr sumlogsum)( 10

i

iQi

Y

QQYLOD

Page 14: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 14

Interval Mapping Tests• profile LOD across possible loci in genome

– maximum likelihood estimates of effects at locus

– LOD is rescaling of L(effects, locus|y)

• test for evidence of QTL at each locus– LOD score (LR test)

– adjust (?) for multiple comparisons

Page 15: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 15

Interval Mapping Estimates

• confidence region for locus– based on inverting test of no QTL

– 2 LODs down gives approximate CI for locus

– based on chi-square approximation to LR

• confidence region for effects– approximate CI for effect based on normal

– point estimate from profile LOD

)ˆ(se96.1ˆeffects genetic

2)()ˆ(|CIlocus

QQ GGCI

LODLOD

Page 16: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 16

Part II: Extension of Phenotype Model

• limitations of parametric models

• quick fixes (but watch out!)

• semi-parametric approaches

• non-parametric approaches

• bottom line:– normal phenotype model works well to pick up

loci, but may be poor at estimates of effects

Page 17: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 17

Limitations of Parametric Models• measurements not normal

– counts (e.g. number of tumors)– survival time (e.g. days to flowering)

• false positives due to miss-specified model– check model assumptions?

• want more robust estimates of effects– parametric: only center (mean), spread (SD)– shape of distribution may be important

Page 18: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 18

What if data are far away from ideal?

8 9 10 11 12 13 14 15 16

qqQq

QQ

no QTL?

8 9 10 11 12 13 14 15 16

qqQq

QQ

skewed?

8 9 10 11 12 13 14 15 16 17 18 19 20 21

qq

Qq

QQ

dominance?

0 1 2 3 4 5 6 7 8 9

qq

QqQQ

zeros?

Page 19: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 19

What shape is your histogram?• histogram conditional on known QT genotype

– pr(Y|qq,) model shape with genotype qq

– pr(Y|Qq,) model shape with genotype Qq

– pr(Y|QQ,) model shape with genotype QQ

• is the QTL at a given locus ?– no QTL pr(Y|qq,) = pr(Y|Qq,) = pr(Y|

QQ,)

– QTL present mixture if genotype unknown

• mixture across possible genotypes– sum over Q = qq, Qq, QQ

– pr(Y|X,,) = sumQ pr(Q|X,) pr(Y|Q,)

Page 20: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 20

Interval Mapping Likelihood

• likelihood: basis for scanning the genome– product over i = 1,…,n individuals

L( ,|Y) = producti pr(Yi|Xi,)

= producti sumQ pr(Q|Xi,) pr(Yi|Q,)

• problem: unknown phenotype model– parametric pr(Y|Q,) = normal(GQ,2)– semi-parametric pr(Y|Q,) = f(Y)exp(YQ)– non-parametric pr(Y|Q,) = FQ(Y)

Page 21: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 21

Useful Models & Transformations• binary trait (yes/no, hi/lo, …)

– map directly as another marker– categorical: break into binary traits?– mixed binary/continuous: condition on Y > 0?

• known model for biological mechanism– counts Poisson – fractions binomial– clustered negative binomial

• transform to stabilize variance– counts Y = sqrt(Y)– concentration log(Y) or log(Y+c) – fractions arcsin(Y)

• transform to symmetry (approx. normal)– fraction log(Y/(1-Y)) or log((Y+c)/(1+c-Y))

• empirical transform based on histogram– watch out: hard to do well even without mixture– probably better to map untransformed, then examine residuals

Page 22: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 22

Semi-parametric QTL• phenotype model pr(Y|Q,) = f(Y)exp(YQ)

– unknown parameters = (f, )• f (Y) is a (unknown) density if there is no QTL = (qq, Qq, QQ)• exp(YQ) `tilts’ f based on genotype Q and phenotype Y

• test for QTL at locus Q = 0 for all Q, or pr(Y|Q,) = f(Y)

• includes many standard phenotype modelsnormal pr(Y|Q,) = N(GQ,2)

Poisson pr(Y|Q,) = Poisson(GQ)exponential, binomial, …, but not negative binomial

Page 23: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 23

Semi-parametric Empirical Likelihood• phenotype model pr(Y|Q,) = f(Y) exp(YQ)

– “point mass” at each measured phenotype Yi

– subject to distribution constraints for each Q:

1 = sumi f(Yi) exp(YiQ)

• non-parametric empirical likelihood (Owen 1988)L(,|Y,X) = producti [sumQ pr(Q|Xi,) f(Yi) exp(YiQ)]

= producti f(Yi) [sumQ pr(Q|Xi,) exp(YiQ)]

= producti f(Yi) wi

– weights wi = w(Yi|Xi,,) rely only on flanking markers• 4 possible values for BC, 9 for F2, etc.

• profile likelihood: L(|Y,X) = max L(,|Y,X)

Page 24: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 24

Semi-parametric Formal Tests• clever trick: use partial empirical LOD

– Zou, Fine, Yandell (2002 Biometrika)– LOD() log10 L(|Y,X)

• has same formal behavior as parametric LOD– single locus test: approximately 2 with 1 d.f.– genome-wide scan: can use same critical values– permutation test: possible with some work

• can estimate cumulative distributions– nice properties (converge to Gaussian processes)

Page 25: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 25

log empirical likelihood detailslog(L(,|Y,X)) = sumi log(f(Yi)) +log(wi)

now profile with respect to ,log(L(,|Y,X)) = sumi log(fi) +log(wi)

+ sumQ Q(1-sumi fi exp(YiQ))

partial likelihood: set Lagrange multipliers Q to 0

point mass density estimates ),|(prsum),|( with

),|()exp(sum 1

ii

QiQi

XQXQp

XQpYf

Page 26: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 26

Histograms and CDFs

histogram

Den

sity

6 8 10 12 14

0.00

0.10

0.20

Y = phenotype

fre

qu

en

cy

6 8 10 12 140.

00.

40.

8

+ +++++++++++++++++++

++++++++++++++++

++++++++++++++++

++++++++++++++++++++++++

++++++++++++++++

+++++++ +

Y = phenotypecu

mu

lativ

e fr

eq

ue

ncy

cumulative distribution

histograms capture shapebut are not very accurate

CDFs are more accuratebut not always intuitive

Page 27: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 27

Rat study of breast cancerLan et al. (2001 Genetics)

• rat backcross– two inbred strains

• Wistar-Furth susceptible

• Wistar-Kyoto resistant

– backcross to WF

– 383 females

– chromosome 5, 58 markers

• search for resistance genes

• Y = # mammary carcinomas

• where is the QTL? dash = normalsolid = semi-parametric

Page 28: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 28

What shape histograms by genotype?WF/WF WKy/WF

line = normal, + = semi-parametric, o = confidence interval

Page 29: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 29

Non-parametric Methods• phenotype model pr(Y|Q, ) = FQ(Y)

= F = (Fqq,FQq,FQQ) arbitrary distribution functions

• Kruglyak Lander (1995)– interval mapping rank-sum test, replacing Y by rank(Y)– claimed no estimator of QTL effects

• estimators are indeed possible– semi-parametric shift (Hodges-Lehmann)

• Zou (2001) thesis, Zou, Yandell, Fine (2002 in review)

– non-parametric cumulative distribution• Fine, Zou, Yandell (2001 in review)

Page 30: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 30

Rank-Sum QTL Methods• phenotype model pr(Y|Q, ) = FQ(Y) • replace Y by rank(Y) and perform IM

– extension of Wilcoxon rank-sum test– fully non-parametric

• Hodges-Lehmann estimator of shift – most efficient if pr(Y|Q, ) = F(Y+Q)– find that matches medians

• problem: genotypes Q unknown• resolution: Haley-Knott (1992) regression scan

– works well in practice, but theory is elusive• Zou, Yandell Fine (Genetics, in review)

Page 31: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 31

Non-Parametric QTL CDFs• estimate non-parametric phenotype model

– cumulative distributions FQ(y) = pr(Y y |Q)

– can use to check parametric model validity

• basic idea:

pr(Y y |X,) = sumQ pr(Q|X,)FQ(y)

– depends on X only through flanking markers– few possible flanking marker genotypes

• 4 for BC, 9 for F2, etc.

Page 32: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 32

Finding NP QTL CDFs

• cumulative distribution FQ(y) = pr(Y y |Q)

• F = {FQ, all possible QT genotypes Q}

– BC with 1 QTL: F = {FQQ, FQq}

• find F to minimize over all phenotypes ysumi [I(Yi y) – sumQ pr(Q|X,)FQ(y)]2

• looks complicated, but simple to implement

Page 33: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 33

Non-parametric CDF Properties

• readily extended to censored data– time to flowering for non-vernalized plants

• nice large sample properties– estimates of F(y) = {FQ(y)} jointly normal

– point-wise, experiment-wise confidence bands

• more robust to heavy tails and outliers

• can use to assess parametric assumptions

Page 34: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 34

What QTL influence flowering time? no vernalization: censored survival

• Brassica napus– Major female

• needs vernalization

– Stellar male• insensitive

– 99 double haploids

• Y = log(days to flower)– over 50% Major at QTL

never flowered

– log not fully effective grey = normal, red = non-parametric

Page 35: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 35

What shape is flowering distribution?B. napus Stellar B. napus Major

line = normal, + = non-parametric, o = confidence interval

Page 36: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 36

Part III: Bayesian Idea

• Thomas Bayes and the original idea• Bayes theorem• How do frequentist & Bayesian approaches

differ?• Choice of Bayesian priors

– normal data phenotype model– empirical Bayes

• Bayesian interval mapping basics

Page 37: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 37

Bayesian QTL Model Selection

• resampling from data– permutation tests

– bootstrap, jackknife

– MCMC• special Markov chain

• Monte Carlo sampling

• show MCMC ideas– Gibbs sampler

– Metropolis-Hastings

– reversible jump MCMC

• Bayesian perspective– common in animal model– use “prior” information

• previous experiments• related genomes

• inbred lines “easy”– can check against *IM– ready extension

• multiple experiments• pedigrees• non-normal data• epistasis

Page 38: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 38

• Reverend Thomas Bayes (1702-1761)– part-time mathematician– buried in Bunhill Cemetary, Moongate, London– famous paper in 1763 Phil Trans Roy Soc London

• Barnard (1958 Biometrika), Press (1989) Bayesian Statistics• Stigler (1986) History of Statistics• Carlin Louis (1996); Gelman et al. (1995) books

– Was Bayes the first with this idea? (Laplace)

• billiard balls on rectangular table– two balls tossed at random (uniform) on table– where is first ball if the second is to its right (left)?

Who was Bayes?

first ball

Page 39: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 39

Where is the first ball?

0)1(2

12)|(pr

2

1)1()(pr

)(pr

)(pr)|(pr)|pr(

1

0

1

Y

YY

dY

Y

YY

YY

prior pr() = 1likelihood pr(Y | )= Y(1- )1-Y

posterior pr( |Y) = ?

Y=0 Y=1

first ball

second ball

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

Y 1

Y 0

pr

|Y

(now throw second ball n times)

Page 40: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 40

What is Bayes Theorem?• before and after observing data

– prior: pr() = pr(parameters)– posterior: pr(|Y) = pr(parameters|data)

• posterior = likelihood * prior / constant– usual likelihood of parameters given data– normalizing constant pr(Y) depends only on data

• constant often drops out of calculation

)(pr

)(pr)|(pr

)(pr

),(pr)|(pr

Y

Y

Y

YY

Page 41: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 41

What is Probability?

Frequentist analysis•chance over many trials – long run average

• estimates• confidence intervals

– long term frequency• hypothesis tests• p-values

•Type I error rate– reject null when true– chance of extreme result

Bayesian analysis•uncertainty of true value

•prior

– uncertainty before data

– incorporate prior knowledge/experience

•posterior

– uncertainty after analyzing current data

– balance prior and data

Page 42: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 42

012345678910

0.0

0.2

parameter = 2012345678910

0.0

0.2

parameter = 4012345678910

0.0

0.2

parameter = 6

0 2 4 6 8

0.0

0.00

020.

0004

likel

ihoo

d

parameter

00.

10.

20.

3po

ster

ior

Likelihood and Posterior Example

comm.) pers. Newton, M.(

!)|(pr

?:parameter

8,3,1:data

y

eyY

Y

y

Page 43: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 43

Frequentist or Bayesian?

• Frequentist approach– fixed parameters

• range of values

– maximize likelihood• ML estimates• find the peak

– confidence regions• random region• invert a test

– hypothesis testing• 2 nested models

• Bayesian approach– random parameters

• distribution

– posterior distribution• posterior mean• sample from dist

– credible sets• fixed region given data• HPD regions

– model selection/critique• Bayes factors

Page 44: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 44

Frequentist or Bayesian?

• Frequentist approach– maximize over mixture

of QT genotypes– locus profile likelihood

• max over effects

– HPD region for locus• natural for locus

– 1-2 LOD drop

• work to get effects– approximate shape of

likelihood peak

• Bayesian approach– joint distribution over

QT genotypes

– sample distribution• joint effects & loci

– HPD regions for• joint locus & effects

• use density estimator

Page 45: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 45

Choice of Bayesian priors• elicited priors

– higher weight for more probable parameter values• based on prior empirical knowledge

– use previous study to inform current study• weather prediction, previous QTL studies on related organisms

• conjugate priors– convenient mathematical form– essential before computers, helpful now to simply computation– large variances on priors reduces their influence on posterior

• non-informative priors – may have “no” information on unknown parameters– prior with all parameter values equally likely

• may not sum to 1 (improper), which can complicate use

• always check sensitivity of posterior to choice of prior

Page 46: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 46

Y = G + E posterior for single individualenviron E ~ N(0,2), 2 known likelihood pr(Y | G,2) = N(Y | G,2)

prior pr(G | 2,,) = N(G | ,2/)

posterior N(G | +B1(Y-), B12)

Yi = G + Ei posterior for sample of n individuals

shrinkage weights Bn go to 1

Bayes for normal data

1,sum with

),(N),,,|pr(2

2

n

nB

n

YY

nBYBGYG

ni

nn

Page 47: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 47

effect of prior variance on posterior

normal prior, posterior for n = 1, posterior for n = 5 , true mean

2 3 4 5 6 7 8 9 11 13 15 17

0.5

2 3 4 5 6 7 8 9 11 13 15 17

1

2 3 4 5 6 7 8 9 11 13 15 17

2

Page 48: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 48

Yi = G(Qi) + Ei genetic Qi = qq,Qq,QQ

environ E ~ N(0,2), 2 known

parameters = (G,2)

likelihood pr(Yi | Qi,G,2) = N(Yi | G(Qi),2)

prior pr(GQ | 2,,) = N(GQ | ,2/)

posterior:

posterior by QT genetic value

1,sum},{count

),(N),,,,|(pr

}:{

22

Q

QQ

Q

i

QQiQiQ

QQQQQQ

n

nB

n

YYQQn

nBYBGQYG

i

Page 49: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 49

How do we choose hyper-parameters ,?

Empirical Bayes: marginalize over prior

estimate , from marginal posterior

likelihood pr(Yi | Qi,G,2) = N(Yi | G(Qi),2)

prior pr(GQ | 2,,) = N(GQ | ,2/)

marginal pr(Yi | 2,,) = N(Yi | ,2 ( + 1)/)

estimates

EB posterior

Empirical Bayes: choosing hyper-parameters

QQQQQQ

ii

nBYYBYGYG

s

nYYsY

2

22

22

ˆ),(ˆN)|(pr

/or 1

/)(sum,ˆ

Page 50: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 50

What if variance 2 is unknown?• recall that sample variance is proportional to chi-square

– pr(s2 | 2) = 2 (ns2/2 | n)– or equivalently, ns2/2 | 2 ~ n

2 • conjugate prior is inverse chi-square

– pr(2 | ,2) = inv-2 (2 | ,2 )– or equivalently, 2/2 | ,2 ~

2 – empirical choice: 2= s2/3, =6

• E(2 | ,2) = s2/2, Var(2 | ,2) = s4/4

• posterior given data– pr(2 | Y,,2) = inv-2 (2|+n, (2+ns2)/(+n) )

Page 51: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 51

Yi = G(Qi) + Ei genetic Qi = qq,Qq,QQ

environ E ~ N(0,2)parameters = (G,2)

likelihood pr(Yi | Qi,G,2) = N(Yi | G(Qi),2)

prior pr(GQ | 2,,) = N(GQ | , 2/)pr(2 | ,2) = inv-2 (2 | ,2 )

posterior:

joint effects posterior details

nQGYsn

nB

nns

nGQY

nBYYBYGQYG

iiiQQ

QQ

QQ

QQQQQQ

/)(sum, with

,-inv),,,,|(pr

),(N),,,,|(pr

22

222222

22

Page 52: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 52

Bayesian Idea for QTLs

• key idea– sample missing genotypes Q

• using recombination model

– phenotype model given Q• see previous slides

• methods and philosophy– EM & MCMC

– Frequentists & Bayesians

• review interval maps & profile LODs• case study: simulated single QTL

observed X Y

missing Q

unknown

Page 53: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 53

QTL Full Posterior

• posterior = likelihood * prior / constant

• posterior( paramaters | data )pr( loci, genos, effects | trait, map )

),|(pr),|(pr product ),|(pr),|(pr

),|(pr),|(pr)(pr)(prsum)(prconstant

constant

),|(pr),|(pr)(pr)(pr),|,,(pr

},,{

QYXQQYXQ

QYXQY

QYXQXYQ

iii

Q

Page 54: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 54

marginal posteriors

• joint posterior– pr(,Q, |Y,X) = pr()pr()pr(Q|X,)pr(Y|Q,) /constant

• genetic effects– pr( |Y,X) = sumQ pr( |Y,Q) pr(Q|Y,X)

• QTL locus– pr( |Y,X) = sumQ pr( |X,Q) pr(Q|Y,X)

• QTL genotypes more complicated– pr(Q|Y,X) = sum, pr(Q|Y,X,, ) pr(, |Y,X)– impossible to separate and in sum

observed X Y

missing Q

unknown

Page 55: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 55

prior & posterior for genotypes Q• prior is recombination model

pr(Q|Xi ,)• can explicitly decompose by individual i

– binomial (or trinomial) probability

• posterior for genotype depends on– effects via trait model– locus via recombination model

• posterior agrees exactly with interval mapping– used in EM: estimation step– but need to know locus and effects

),|(pr),|(pr sum

),|(pr),|(pr),,,|(pr

iiQ

iiiiQi XQQY

XQQYXYQP

Page 56: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 56

prior & posterior for QT locus• no prior information on locus

– uniform prior over genome

– use framework map• choose interval proportional to

length

• then pick uniform position within interval

0 20 40 60 80

0.0

0.2

distance (cM)

• prior information from other studies

•concentrate on credible regions•use posterior of previous study as new prior

posterior

prior

Page 57: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 57

QTL Bayesian Inference

• study posterior distribution of locus & effects– sample joint distribution

• locus, effects & genotypes

– study marginal distribution of• locus

• effects – overall mean, genotype difference, variance

• locus & effects together

• estimates & confidence regions– histograms, boxplots & scatter plots

– HPD regions

Page 58: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 58

Marginal Posterior Summary

• marginal posterior for locus & effects• highest probability density (HPD) region

– smallest region with highest probability– credible region for locus & effects

• HPD with 50,80,90,95%– range of credible levels can be useful– marginal bars and bounding boxes– joint regions (harder to draw)

Page 59: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 59

Posterior for locus & effect

36 38 40 42

1.8

2.0

2.2

distance (cM)

36 38 40 42

0.0

0.1

0.2

0.3

distance (cM)

1.8

2.0

2.2

QTL 1

addi

tive

summaries of MC samplesjoint scatter plot

(lines for true, estimates)marginal histogrammarginal box-plot

Page 60: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 60

36 38 40 42

1.6

1.8

2.0

2.2

2.4

distance (cM)

addi

tive

HPD region for locus & effect

BayesianHPDs

frequentist CIs

Page 61: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 61

Part IV: Monte Carlo Samplingfrom the Bayesian posterior

• How can we study Bayesian posteriors?– frequentist & Bayesian methods

– EM & MCMC duality

• What is a Markov chain?• How does Markov chain Monte Carlo work?

– use Markov chain to sample from posterior

– Gibbs sampler for effects , genotypes Q

– Metropolis-Hastings for loci

Page 62: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 62

How can we studyBayesian posteriors?

• exact methods if possible– manipulate math formula

– can be difficult or impossible to analyze

• approximate methods– importance sampling

– numerical integration

– Monte Carlo & other

• Monte Carlo methods– easy to implement

– independent samples

– just draw large sample

• MCMC methods– handle hard problems

– art to efficient use

– iterative: Markov chain

– correlated samples

Page 63: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 63

How to study the QTL likelihood?

• frequentist approach– maximize (*IM)

• find the peak

• avoid local maxima

– profile LOD across locus• maximize for effects

• approximate methods– EM (Lander Botstein 1989)

– MCMC (Guo Thompson 1994)

• Bayesian approach– sample from posterior

– examine whole posterior• summarize later

• joint locus & effects

• approximate methods– MCMC (Satagopan et al. 1996)

– imputation (Sen Churchill 2001)

Page 64: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 64

EM-MCMC duality• EM approaches can be redone with MCMC

– EM estimates & maximizes– MCMC draws random samples

• simulated annealing: gradually cool towards peak

– both can address same problem

• sometimes EM is hard (impossible) to use• MCMC is tool of “last resort”

– use exact methods if you can– try other approximate methods– be clever! (math, computing tricks)– very handy for hard problems in genetics

Page 65: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 65

Simulation Study

• 200 simulation runs

• n = 100, 200; h2 = 10, 20%

• 1 QTL at 15cM

• markers at 0, 10, 20, 40, 60, 80

• effect = 1

• variance depends on heritability h2

Page 66: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 66

200 Simulations: Locus

0 10 20 30 40

515

2535

QTL Cartographer

Bay

esia

n M

CM

C

n = 100 h^2 = 10

0 10 20 30 40

1020

3040

QTL Cartographer

Bay

esia

n M

CM

C

n = 100 h^2 = 20

5 15 25 35

1020

3040

QTL Cartographer

Bay

esia

n M

CM

C

n = 200 h^2 = 10

0 5 10 15

510

1520

25

QTL Cartographer

Bay

esia

n M

CM

C

n = 200 h^2 = 20

Page 67: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 67

200 Simulations: Effect

1.0 1.5 2.0

0.5

1.0

1.5

QTL Cartographer

Bay

esia

n M

CM

C

n = 100 h^2 = 10

0.6 1.0 1.4

0.2

0.6

1.0

1.4

QTL Cartographer

Bay

esia

n M

CM

C

n = 100 h^2 = 20

0.6 1.0 1.4

0.4

0.8

1.2

QTL Cartographer

Bay

esia

n M

CM

C

n = 200 h^2 = 10

0.8 1.0 1.2

0.6

0.8

1.0

1.2

QTL Cartographer

Bay

esia

n M

CM

C

n = 200 h^2 = 20

Page 68: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 68

Why not Ordinary Monte Carlo?• independent samples of joint distribution• chaining (or peeling) of effects

pr(|Y,Q)=pr(GQ | Y,Q,2) pr(2 | Y,Q) • possible analytically here given genotypes Q• Monte Carlo: draw N samples from posterior

– sample variance 2

– sample genetic values GQ given variance 2

• but we know markers X, not genotypes Q!– would have messy average over possible Q– pr( |Y,X) = sumQ pr( |Y,Q) pr(Q|Y,X)

Page 69: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 69

What is a Markov chain?• future given present is independent of past• update chain based on current value

– can make chain arbitrarily complicated– chain converges to stable pattern π() we wish to study

0 1 1-q

q

p

1-p

)/()1(pr qpp

Page 70: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 70

Markov chain idea

0 1 1-qq

p

1-p

)/()1(pr qpp

Mitch’s

otherpubs

0

1

1 3 5 7 9 11 13 15 17 19 21

Series1

Series2

Page 71: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 71

Markov chain Monte Carlo• can study arbitrarily complex models

– need only specify how parameters affect each other– can reduce to specifying full conditionals

• construct Markov chain with “right” model– joint posterior of unknowns as limiting “stable” distribution– update unknowns given data and all other unknowns

• sample from full conditionals• cycle at random through all parameters

– next step depends only on current values

• nice Markov chains have nice properties– sample summaries make sense– consider almost as random sample from distribution– ergodic theorem and all that stuff

Page 72: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 72

MCMC Idea for QTLs• construct Markov chain around posterior

– want posterior as stable distribution of Markov chain

– in practice, the chain tends toward stable distribution• initial values may have low posterior probability

• burn-in period to get chain mixing well

• update components from full conditionals– update effects given genotypes & traits

– update locus given genotypes & marker map

– update genotypes Q given traits, marker map, locus & effects

NQQQ

XYQQ

),,(),,(),,(

),|,,(pr~),,(

21

Page 73: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 73

sample from full conditionals

• hard to sample from joint posterior– pr(,Q, |Y,X) = pr()pr()pr(Q|X,)pr(Y|Q,) /constant

• easy to sample parameters from full conditionals– full conditional for genetic effects

• pr( |Y,X,,Q) = pr( |Y,Q) = pr() pr(Y|Q,) /constant

– full conditional for QTL locus• pr( |Y,X,,Q) = pr( |X,Q) = pr() pr(Q|X,) /constant

– full conditional for QTL genotypes• pr(Q|Y,X,, ) = pr(Q|X,) pr(Y|Q,) /constant

observed X Y

missing Q

unknown

Page 74: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 74

MCMC full conditional updates

effects

locus

traitsY

genosQ

mapX

Page 75: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 75

Gibbs Sampler: effects & genotypes• for given locus , can sample effects and genotypes Q

– effects parameter vector = (G,2) with G=(Gqq,GQq,GQQ)

– missing genotype vector Q = (Q1,Q2,…, Qn)

• Gibbs sampler: update one at a time via full conditionals– randomly select order of unknowns

– update each given current values of all others, locus and data (Y,X)• sample variance 2 given Y,Q and genetic values G• sample genotype Qi given markers Xi and locus

– can do block updates if more efficient

• sample all genetic values G given Y,Q and variance 2

Page 76: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 76

phenotype model: alternate form• genetic value G(Q)=GQ in “cell means” form easy• but often useful to model effects directly

– sort out additive and dominance effects– useful for reduced models with multiple QTL

• QTL main effects and interactions (pairwise, 3-way, etc.)

• we only consider additive effects here– Gqq = – a, GQq = , GQQ = + a

• recoding for regression model– Qi = –1 for genotype qq– Qi = 0 for genotype Qq– Qi = 1 for genotype QQ– G(Qi) = + aQi

Page 77: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 77

MCMC run of mean & additive

1

1

111

1

1

111

1

11

1

1

111

1

1

1

1111

1

1

1

11

111

1

11

1

111

11

1

111

1

11

11

1

1

1

1

1

1

1

111

1

111111

1

111

1

11

1

1

1

1

1

11111

1

1111

1

111

1

1111

111

1

11111

1

1

11

1

111111

11

1

1111

1

1

1

1

11

1

1

111

1

11111

1111

1

11

111

111

11

1

1

11

111

1

11111111

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

111

1

1

11

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

111

1

1

11

11

1

1

1

1

1

11

1

11

1

1

1

1

11

11

11

111111

111

1

11111

1

1

1

1

111

11

111

1

111

11

1

1

111

1

11

11

1

1

1

111

1111

1

11

1

11

1

1

11

1

1111

1

11

1111

1

1

1111

11

111

11

1

1

1

1

1

1

1

111

1

11

11

11

11

1

111

1

1

11

111

1

11

1

1

11

1

111

1

111111

11

1

11

11

111

111

1

11111

111

11

11

1

1

1

1

111

1

111

1

11

1

1

1

1

111

1

1

1

11

1

1

1

1

1

1

1

1

111

1

1

1

11

1

1

111

11

1111

1

1

1111

11

1

11

1

1

1

1

1

1

1

11

11

1

1

1

1

1

1

1

11

11

111

1

1

1

1

111

1

111111

111

1

11

1

11

111

1

1

11

11

111111

1

1

11

1

1

111

1111

11

1

1

11

111

1

1

1

111

1

11

1

111

1

11

1

1

111

1

11

1

1

1111

11111

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

11

1

11

1

1

1

1

1

1

1

1

11

11

1

1

1

1

11

11

1

11

11

1

1

1

1

1

1

1

1

1

1111

1

11

11

1

1

1

11

1

11111

1

1

11

11111

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

11

1

1

1

111

11

1

1

1111

1

11

1

1

1

1

11

1

1

1

1

11

1

1

1

1

1

1

1

1

111

1

1

1

111

1

1

1

1

1

111

1

1

1

1

1

11

1

111

11

1

1

1

1

1

1

1

1

1

11111

1

11

1

1

1

1

1

1

1111

1

1

1

11

1

1

1

1

1

1

11

1

1

11

1

1

1

1

1

111

1

11

1

11

1

1

11111

1

1

1

1

1

1

1

1

111

1

1

11

1

11

111

1

11

1

1

1

1

1

111

111

1

1111111

11

1

11

1

1111

1

111

1

1

1

1

11

1111

1

1

1

11

1

1

1

1

11

1

1

1

1

11

111

1

11

1

11

11

11

1

1111

1

11

1

1

11

1

1

1

1

1

1

11

11

1

1

1

1

11

11

111

111

1111

1

111

111

1

1

1

1

11

1

1

11

11

1

1

11

1

1111

11

0 200 400 600 800 1000

9.6

9.8

10

.01

0.2

10

.4

MCMC run/100

9.6

9.8

10

.01

0.2

10

.4

0 20 40 60

mea

n

frequency

1

11

1111

1

11

111

1

1

111

111

1

1

111

11

1

1

1

1

11

1

11

11

1

111

1

1

1

1

1

1

1

11111

11

111

11

11

1

1

11

1

11

1

1

1

1

1

1

1

11

1

1

1

11

11

11

111

111

11

1

1

1

1

1

1

111

1

1

1

1111

1

1

111

1

1

11

1

11

1

1

1

11

1

111111

1

1

1

1

11

1

11

1

1

11

1

1

11

1

1

1

1

1

11

1

1

1

1

1

1

1

11

1

1

11

11

1

111

11

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

111

1

1

1

1

1

1

11

1

11

1

111

1

1

11

1

1

11

1

111

1

1

1

1

1

1

1111

1

1

1

11

1

1

1

1

11

1111

1

1

11

1

1

1

11

11

1

1

1

1

1

11

11

1

1

1

1

11

1

1

11

1

11

1

1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

11111

1

1

1

1

111

1

1

1

11

11

1

1

1

1

11

1

11

1

111

1

1

11

1

1111

11

11

1

1

1

1

1

1

1

11111

1

11

1

1

111

1

1

1

1

1

1

11

1

1111

11

1

1

1

1

1

1

1

1

1

11

1

1

1

111

1

1111

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

11

1

1

1

1

1

1

1

11

1

111

111

11

111

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

111111

11

11

111

1

1

1

1

1

1

11

1

1

1

11

1

11

1

1

1

1

1

1

1

1

11

1

111

1

1

11

11

1

1

1

1

111

1

1

1

1

1

1

11

1

111

111

1

111

111

11

1

111

1

1

1

1

111

1

11

1

1

1

1

1

11111

1

1

1

1

1

1

1

11

1

1

1

11

1

1

11

11

1

1

1

1

11

1

1

1

11

1

1

11

1

11

1

1

11111

111

1

1

11

1

1

1

11

1111

1

111

1

1

1

1

1

1111

1

1

1

1

1

1

1

1

1

11

1111

1

11

1

1

1

11

1

1

1

1

1

1

11

1

1

1

1

1

1

111

1

11

11

1

11

1

11

11111

1

1

11

1

1

1

1

11

1

1

1

1

1

1

111

1

1

1

1

11

1

111

1

1

11

111

111111

111

111

1

1

1

1

1

1

1

111

1

1

11

1

111

1111

11

1111

1

1

1

1

1

1

1

1

111

11

11

1

1

1

1

1

11

11

111

1

1111

1

11

11

111

11

1

11

111

11

1

11

1

1

1

111

1

11

1

1

1

11

1

1

11

1111

1

1

1

11111

11

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

111

1

1

1

11111

11

11

1

1111

1

1

1

11

1

11

1

11

1

111

1

1

1

1

1

111

1

11

1

1

111

1111

1

1

1

11

1

11

11

1

1

1

1

1

1

1

11

1

1

1

1

1

111

1

1

1

111

1

1

11

11

1

1

1

1

1

1

11

1

1

1

1

0 200 400 600 800 1000

1.8

2.0

2.2

MCMC run/100

1.8

2.0

2.2

0 20 40 60

addi

tive

frequency

Page 78: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 78

MCMC run for variance

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

11

1

11

1

1

1

1

1

11

1

1

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

111

1

11

1

1

1

1

1

111

1

1

11

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

111

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

11

1

111

1

1

1

1

11

1

1

111

1

1

1

1

1

1

1

1

11

1

1

1

1

11

11

1

1

11

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

1

1

1

1

1

11

1

1

11

1

1

11

1

1

1

1

1

1

1

11

1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1111

111

1

1

1

1

1

1

111

1

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

11

11

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

11

11

11

1

1

1

1

1

1111

1

1

11

1

1

1

11

1

1

1

1

11

1

1

1

11

1

1

1

1

1

1

1

1

11

1

111

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

11

11

1

11

1

1

1

1

11

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

11

1

1

111

1

1

11

111

1

1

11

11

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

11

1

1

1

1

11

1

11

11

1

11

11

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

11

1

1

1

1

11

11

1

1

1

1

1

1

11

1

11

1

1

11

1

11

1

1

1

1

1

1

1

1

1

111

1

111

111

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

111

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

1

1

11

1

1

1

1

1

1

1

11

11

1

1

1

1

1

1

1

1

111

1

1

1

11

1

1

1

1

1

1

1

1

1

11

1

1

1

111

1

11

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

1

11

1

1

1

1

11

1

1

1

1

11

1

1

1

1

1

1

1

111

1

1

1

11

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

1

1

11

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

0 200 400 600 800 1000

1.0

1.2

1.4

1.6

1.8

2.0

MCMC run

1.0

1.2

1.4

1.6

1.8

2.0

0 10 20 30 40

varia

nce

frequency

Page 79: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 79

missing marker data• sample missing marker data a la QT genotypes• full conditional for missing markers depends on

– flanking markers– possible flanking QTL

• can explicitly decompose by individual i– binomial (or trinomial) probability

),,|(pr),,,,|(pr

AAor Aa aa,

iiikiiiik

ik

QXXQXYX

X

Page 80: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 80

full conditional for locus• cannot easily sample from locus full conditional

pr( |Y,X,,Q) = pr( |X,Q)= pr() pr(Q|X,) /constant

• cannot explicitly determine full conditional– difficult to normalize– need to average over all possible genotypes over entire map

• Gibbs sampler will not work – but can use method based on ratios of probabilities…

Page 81: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 81

Metropolis-Hastings Step• pick new locus based upon current locus

– propose new locus from distribution q( )• pick value near current one?• pick uniformly across genome?

– accept new locus with probability a()

• Gibbs sampler is special case of M-H– always accept new proposal

• acceptance insures right stable distribution– accept new proposal with probability A– otherwise stick with current value

),()|(

),()|(,1min),(

*

*

newoldold

oldnewnewnewold q

qA

xx

Page 82: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 82

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

111

1

1

11

1

11

1

1

1

11

1

1

1

11

1

1

1

1

1

1

1

111

1

1

1

11

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

11

1

111

1

1

1

1

1

1

1

111

1

11

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

11

1

1

1

1

1

1

111

1

1

1

1

1

1

1

1

1

1

1

11

1

11

1

1

11

1

1

1

1

1

11

1

1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

11

1

11

11

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

1

1

111

1

1

1

11

1

11

1

1

11

1

1

1

1

1

11

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

111

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

11

111

1

1

11

1

1

1

1

1

1

1

1

1

1

1

11

11

1

1

1

1

11

1

11

1

1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

11

1

1

1

1

1

1

1

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

1

1

111

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

111

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1111

1

1

1

1

111

1

11

1

1

1

1

1

1

11

1

1

1

1

1

1

11

1

1

11

1

1

11

1

1

1

1

1

1

1

11

1

11

1

1

1

1

1

1

1

11

1111

1

1

1

1

1

1

1

111

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1111

1

1

11

1

1

111

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

111

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

11

1

1

1

11

1

1

111

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

111

1

11

1

1

11

11

1

1

11

1

1

1

1

1

11

1

1

1

1

1

1

1

1

11

1

1

1

11

0 200 400 600 800 1000

36

38

40

42

MCMC run/100

36

38

40

42

0 20 40 60

dist

ance

(cM

)

frequency

MCMC Run for 1 locus at 40cM

Page 83: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 83

Care & Use of MCMC• sample chain for long run (100,000-1,000,000)

– longer for more complicated likelihoods– use diagnostic plots to assess “mixing”

• standard error of estimates– use histogram of posterior– compute variance of posterior--just another summary

• studying the Markov chain– Monte Carlo error of series (Geyer 1992)

• time series estimate based on lagged auto-covariances

– convergence diagnostics for “proper mixing”

Page 84: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 84

Part V: Multiple QTL• multiple QTL phenotype model

• issues for 2 QTL

• MCMC sampling from the posterior

• Simulated data for 0,1,2 QTL

• Brassica data on days to flowering

Page 85: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 85

MCMC run: 2 loci assuming only 1

111

1

1111

1

111111

1

11

111

1

1

11

1

1111

11111111

1

1

11

11111111

11

111

1

11

1

1

1

1

1

1111

11

11

1

1

1

1

11

1

111

1

1

1

1

1

1

11

1

1

1

1

111

11

1

1

1

1

11

1

1

1

111

11

111111

1

1

1

1

1

1

11

1

111

1

1111

1

1

1

1

1

1

1

111

111

11

11

1

1

11

1

1111111

1

1

1

11

1

11

11

11

1111

111111

1

1

11

11

1

11

1

1

1

111

1

1

1

11

1

1

11

1

1

1

1

1

1

111

1

1

1

11

1

1

11

1

1

1111

11

1

1

1

1

1

11

1

1

11

1

1

1

1

111

1111

1

11111111

1111

1

1

111111

1

111

1

1111

1

1111

1

111111

1

111

1

1

1

1

1

1

1

1

11

1

1111111111

1

1

1

1111

1

1111

1

111

1

111

1

1

1111

11

11

1

111

1

11

1

111

1

11

1

1

11

1

11

11

1111111111

11

1

1

1

11

1

1

11

11

1

1

1

1

1

1111

1

1

1

1

11

1

1

1

1

1

1

1

1

111

11

1

1

1

1

1

1

11

11

11111

1

1

1

111

1

1

11

111

1

1

1

1

1

1

1

111

1

1

1

11

1

11

1

1

1

111

1

1

111

1

1

1

1

111

1

1

1

1

1

11

1

1

1

1

1

11

1

1

111

1

111

1

1

1

1

1

1

1

1

111

1

1

1

1111

11

1

1

1

1

1

11

1

11

1

1

1

1

111

11

1

1

111

1

1

11

11

1

11

11

1

1

1

1

1

1

1

1

1

1

11111

1

11

1

1

111

1

1

1

1

11

11111111

1

1

1111

1

11

11

1

111

11

1

1

1

1

1

1

1

1

1

111111

1

1

1

1

1

1111111

1

111

1

1

11111

111

1

1

1

1

1

1

1111

111

1

1

1

11

11

11

1

111111

1

1

1

11

11

1

1

1

111

1

11

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

11

1

1

11

1

111

1

111

11

11

1

11

1

1

11

1

1

1

11111

1

1111

1

1

1

1

11

1

111

1

1

1

111

1

1

1

1

111

11

1

11

1

1

1

11

11

11

1

1

1

1

1

111

11

1

1

1

1

1

1

1

1

1

111

1

1

1111

1

1

1

111

1

1

11

11

1

1

1

1

1

11

11111

111

1

1

111

1

1

11

1

11

1

11

1

1

1

1111111

1

1

11111111111111

1

11

111

1111

1

1

1

111

1111111111

1111

1

1111

11

111

1

1

1

1

1

1111111

11

1

1

11

1

1

11

1

1

1

1111

111

11

1

11111111111111

0 200 400 600 800 1000

40

50

60

70

MCMC run/1000

40

50

60

70

0 50 100 150 200 250

dist

ance

(cM

)

frequency

Page 86: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 86

Multiple QTL model

• trait = mean + add1 + add2 + error

• trait = genetic effect + error

• pr( trait | genos, effects )

?),/2,0(N~),,(N~ 22

1

2211

msasY

eQaY

eQaQaY

r

i

m

rriri

iiii

Page 87: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 87

Simulated Data with 2 QTL4

68

1012

1416

-1,-1 -1,1 1,-1 1,1recombinants

46

810

1214

16

0 1 2 3 4 5 6frequency

trai

t

Page 88: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 88

Issues for Multiple QTL• how many QTL influence a trait?

– 1, several (oligogenic) or many (polygenic)?– or do most genes influence most complex traits?– how many are supported by the data?

• effects can be localized

• searching for 2 or more QTL– conditional search (IM, CIM)– simultaneous search (MIM)

• epistasis (inter-loci interaction)– many more parameters to estimate– effects of ignored QTL

Page 89: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 89

0 5 10 15 20 25 30

01

23

rank order of QTL

addi

tive

eff

ect

Pareto diagram of QTL effects

54

3

2

1

major QTL onlinkage map

majorQTL

minorQTL

polygenes

Page 90: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 90

interval mapping approach

• interval mapping (IM)– scan genome for 1 QTL

• composite interval mapping (CIM)– scan for 1 QTL while adjusting for others– use markers as surrogates for other QTL

• multiple interval mapping (MIM)– use CIM as basic model– forward selection/backward elimination

Page 91: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 91

LOD for 2 QTL0

510

1520

2530

distance (cM)

LO

D

0 10 20 30 40 50 60 70 80 90

QTLIMCIM

Page 92: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 92

Bayesian approach• simultaneous search for multiple QTL• use Bayesian paradigm

– easy to consider joint distributions– easy to modify later for other types of data

• counts, proportions, etc.

– employ MCMC to estimate posterior dist

• study estimates of loci and effects• model selection and assessment

– Bayes factors for number of QTL– posterior model averaging for loci

Page 93: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 93

MCMC for multiple QTLs• posterior now has m loci rather than 1 locus

– just change interpretation of unknowns– add extra subscript to keep track of loci

• construct Markov chain around posterior– “easy” extension of 1 QTL approach– now randomly pick which loci to update

• update all terms for each locus at one time?– open questions of efficient mixing

NQQQ

XYQQ

),,(),,(),,(

),|,,(pr~),,(

21

Page 94: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 94

MCMC run with 2 loci

111

1

1

1

11

1111111

1

1111

1111

1111

1

1

11

1111

1111

1

1

111

1

1111

11111

1

111

1

1111111111

111

1111

1111

1111111

1

1111

11

1

11

11111111

1

1

1

111111111

1111111111111111111111111

1

11

1

1

1

11111

1

11111111111111

11

111111111111

11

11111111111111

111111

11

1

1

1

1111

1

11

1111111

1

11111111111

11

1

1

1

1111111111111

11111111111

111

11

11111111

1

11111

1

11

1

1

1111111111

1

1

111111

11111

1111

1111

1

1

1111

1

1111

11111

111

1

1

1

1

1

1

111111111111

1

111

1

1

1

1

1

1111

11

1

1

1

1

1

111

1111

1

1

1

11

111111

1

1

11111111111111111

1

1

11111111

1

111111111111111

11

1111

1111

11111

111111

1111

11

111111

111

1

111

111111

111111

1111

1

111

11111

1111111

111111

111

111111

11

1

11111111111

1

1111

111

11

1

1

1111111

11

111111111111

1

1

11111

1

1111

11111111

1

111

1

1

1

111111

1

11111

111111

1

1

11111

1

111

1

1

1

1

1

11111

11111111

111111111

111111

1

1111

1

11111

11

11

11

11111111111111

1111

1111

111

11111

11

11111

1111111111

1

1111

111

1

11111

1

1111111

111

111

111111

1

1111111

1111111111

1

111

1

11111

11111

1

111111111

1

11111

11

1111111111

1

1

1

1

1

1

1

1

1

11

111111111

1

1

1

1111

1

11

1111111

1

111111111

1

1

1

11

11

111111111

1

1111111

11111111111

11

11

111111111

11

1111

1111111111

111

1

11

111

1

1

111111

1

1

1

1

1

1

11

1

1

111

11111111

1

1

1

11

1

11111

1

11

111111

1

1

1

11

22

2

2

2

22

2

2

22

2

2

2

22

2

22

2

2

22

22

2

2

222

2

2

22

2

2

22

2

2

22222

2

2

22

2

22

2222222222

2

2

2

2222

22

2

2222

22

2

2222222

2

2

2

2

2

222222

222

2

2

2

2

2222

2

2

2222

2

2

2

22

222

2

2

2

2

2222

2

22

222

22

222222

22

222

2

2

2

222222222

22222

2

2

22

22

22

2

2

2

222

2

2

2

2

222222

2

2

222

222

2

2

2

2

22222

22

2

2

2

222

2

2

2222

2

222

2

22

2

22

2222

2

22

2

2222222

2

2

2

22

2

2

2

2

2

2

2222

2

2222

2222

22

2

2

22

2

22

2

22

222

2

222222

2

2

2

2

2

2

22222

22222

2

2

2

2

22

22

22

2

2

2

22222

2

2

22

22

222

2

2222

22

22

2

2

2

2

222

2

2

22

2

2

222

22222

2222

2

2

2

2222

2

2222

2

222

2

222

2

22

2

2

2222

2

2

2

2

2

22

2

2

222

222

2

2

22

2

222

2

2

22

222

2

2

2

22

222

222

2

2

22

22

2

2

22

222222222

2

22

222

2222222

22

22

2

22222222

22

2

2

2

2

2

2

2

2222222

22222

2

2

22

2222

2

2

2

222222

2

2

2

2

222222

2

222222

222

22

2

22

22

222

22

2

2

2

2

2

22

22222222222

22

2

222

22

222222222

2

222222

2

2222

2

222

2

2

2

2222

222

2

2

22

22

22

2222

2

22

22

22

22

2

2

2

2

22

2

2

2222

22

22

2

22222

2222

2

2

2

2

2

22

22

22

22

222222

22

222

2

2

2222222

22

2

22

22

22222

2

2

2

2222

2

22

22

2

2222

2

2

22222

2

2

2

2222

2

2

2222

2

2

2

22

2

2

2

2

2

2

2

2

22

222

222

2

2222

2

222

22222

2

22

22222

2

222

222

2

2

222

22

2

2

2

2222

2222

22

2

22

222222

22

22

2

2

2

2

222

2

2

2

2

22

2

2

2222

2

2

22

22

2

2

2

22

22

2

2

222222

2

2

2

2

2222

2

2

2

2

2

2

2

222

22

2

2

2

2

2

22

222222222222

22

22

2

2

2

222

2222

22

2

2

222

2

22

22

22

22

222

2

2

2

2

2

2

2

2

2

22

22

22222

22

2

2

2

2

22

2

22

2

2

2

2222222

2222

2

222

2

2

2

2

222222

0 200 400 600 800 1000

40

50

60

70

80

MCMC run/100

40

50

60

70

80

0 50 100 150 200 250 300

dist

ance

(cM

)

frequency

Page 95: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 95

distance (cM)

effe

ct

40 50 60 70 80

1.8

2.0

2.2

2.4

2.6

40 50 60 70 80

0.0

0.0

50

.10

0.1

5

distance (cM)

1.8

2.0

2.2

2.4

2.6

effect 1 effect 2

effects for 2 simulated QTL

summaries of MC samplesjoint scatter plots by loci

(lines for true, estimates)marginal histogramsmarginal box-plots

Page 96: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 96

Brassica napus data• 4-week & 8-week vernalization effect

– log(days to flower)

• genetic cross of– Stellar (annual canola)

– Major (biennial rapeseed)

• 105 F1-derived double haploid (DH) lines– homozygous at every locus (QQ or qq)

• 10 molecular markers (RFLPs) on LG9– two QTLs inferred on LG9 (now chromosome N2)

– corroborated by Butruille (1998)

– exploiting synteny with Arabidopsis thaliana

Page 97: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 97

2.5 3.0 3.5 4.0

2.5

3.0

3.5

4-week

8-w

eek

2.5

3.5

0 2 4 6 8 108-week vernalization

2.5 3.0 3.5 4.0

02

46

8

4-week vernalization

Brassica 4- & 8-week data

summaries of raw datajoint scatter plots

(identity line)separate histograms

Page 98: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 98

Brassica data LOD maps0

24

68

distance (cM)

LO

D

0 10 20 30 40 50 60 70 80 90

QTLIMCIM

8-w

eek

05

1015

distance (cM)

LO

D

0 10 20 30 40 50 60 70 80 90

QTLIMCIM

4-w

eek

Page 99: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 99

11

1

1

1

1

1111

1111

111

1

11

11

11

1

1

1

11

111

1

1

1

11

111

1

11

1

111

1

1

1

1

11

11111

1

1

1

1

111

1

1

1

1

1

111

1

1

1

11

1

11

1

1

111

1

1

1

1

1

11

1

1

1111

111

1

1

1

111

1

11

1111

11

111

1

1

1

11

11

11

1

11

1

11

1

1

1

1111

11

1

1

1

11

1

11

11

1

1

1

1

11

11

111

11

1

11

11

1

1

1

1

11

11

1

1

11

11

1

1

1

111

1

1

111111111

11

1

11

11

1

11

1

1

11

1

1

11

11

11

1

11

111

1

1

1

1

1

11

11

111

1

1

11

1

1

1

111

1

11

111

111

1

11

1

111

1

111

11

1

1

1

1

1111

11

1

1

111111

1

1111

111

11

1

1

1

1

1

1

1

1

11

1

1

111

111

1

1

1

1

1

11111

11

11

1

1

11

11

1

1

1111

1

1

1

11

1

1

11

11

1

1

1

11

1

1

11

11111

1

1

1

1

1

1

1

111

1

11

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

111

11

11

1

1

1

1

111111

11

1

111

1

11

1

1

1

11

1

1

11

1

1

111

11

1

1

1

1

1

1

1

11

1

1

1

1

1

1111

1

11

11

11

1

11

1

1

1

11

1

1

1111

1

1

1

1

11

1

1

1

1

11

1

1

1

11

11

1

1

1

1

1111

11

1111

111

1

1

11

1

1

1

11

11

1

11

1

1

11

1111

1

111

1

11

11

1

1

1

1

1

1

11

111

11

1

1

11

1

11

1

1

111

11

1

1

1

1

1

1

11

11

11

11

1

1

111

1

1

111111

1

1

1

1

11

11

1

1

1

1

1

111

1

1

1

1

1

1

11

11

111

1111

1

1

111

1

11

111

1

1

1

1

11

1

1

1

11

11

11111

1

11111

1

111

1

1111

1

11

1

1

1

1

1

1

11

1

1

111

1

1

1

1

11

1

11

111111

1111

1111

1

1

1

111

1

1

1

11

1

1

1

1

1

1111

11

1

11

1

1

1

11

1

1

1

1

11111

1

1

1111

11

111

11

11

1

1

1

1

1

1

111

1

11

1

11

11

1

11

11

1

1

1

11

111

11

111

11

1

1

1

111

1

1

1

1

1

1

1

1

1

1

111

11111

1

11

1

11

111

11

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

11

1

1

1111

1

1

1

1

1

11

1

1

1

1

111

111

1

1

1

1

1

1

1

1

111

1

1

11

1

111

111

11

111

11

1

1111

1

1

1

1

1

1

1

1

1

1

11

111

1

1

111111

1

111

11

11

11

11

1

11

11

111

111

11111

111

1

1

1

11

11

1

1

1

1

11

1

1

2

2

22222

22222

22222

22

2

2

222222

2222222222222222

2

2222222

2

2

222

22

22

2

2

22

222222222222222

222222222222

222

2222

2

22

22222

2

2

222

222222

2

2222

2

222222222222

2

2222222222222

2222

2

2

2

2

2

22

2

2222222

2

222222

2

22222

2222

22

2222

2

222

22

222222

2

2222

2222

22

22222222

22

2

222

22

222222

222

2

22222222

2

22222

222222

222222222222

2

2222222

22

2222222222222

2

22222

22222222

22222

2

2

2

2222222222

22222

2

2

222222

222222

2

2222

2

2

22

22222222

2

222

22

22222

222222222222

2

22222

2222

222

2

2

2

2222222

222222

222222

2

222

2

222

22

22222222222

22222

222

222222222

222

22

2222222

2222

2

2

222

22222

2

22222222222

2

222222

2222

222

2

222222222222

2

22

2

222

2

22222222

2

22222222222

22222222

222

2

2222

2

2222222

2222222222222

2

2222

222222222

2

22222

2

22222222

22222222222222

222222

2

22222

222222222222222

222222222

2

222222

2

2222

2222222222222222222222

222

2

222222

22222222222

22222

22222222222

222222

2222

2

2222222222222222

2222

22222

2

22222222

222

2

22

2222222

2

2

2

22222

2

222222

2

2222222

222222

2222

22

222

2

22

2222222

22

2

22

22

22222222

22222222222

2222222222

2

22

2

2222222222

222

22222222222222222222222222222

222

2222

2

222222

22

22222222

2222

2

22

222222

2

22

2222

22222222

222

22222222222

222

0 200 400 600 800 1000

20

40

60

80

MCMC run

20

40

60

80

0 50 100 150 200 250 300

dist

ance

(cM

)

frequency

Brassica 8-week data locus MCMCwith m=2

Page 100: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 100

4-week vs 8-week vernalization

4-week vernalization

• longer time to flower

• larger LOD at 40cM

• modest LOD at 80cM

• loci well determined

cM add

40 .30

80 .16

8-week vernalization

• shorter time to flower

• larger LOD at 80cM

• modest LOD at 40cM

• loci poorly determined

cM add

40 .06

80 .13

Page 101: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 101

20 40 60 80

-0.6

-0.4

-0.2

0.0

0.2

distance (cM)

addi

tive

4-week

20 40 60 80

-0.3

-0.2

-0.1

0.0

0.1

0.2

distance (cM)

addi

tive

8-weekBrassica credible regions

Page 102: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 102

collinearity of QTLs• multiple QT genotypes are correlated

– QTL linked on same chromosome– difficult to distinguish if close

• estimates of QT effects are correlated– poor identifiability of effects parameters– correlations give clue of how much to trust

• which QTL to go after in breeding?– largest effect?– may be biased by nearby QTL

Page 103: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 103

-0.6 -0.4 -0.2 0.0 0.2

-0.6

-0.4

-0.2

0.0

additive 1

addi

tive

2

cor = -0.81

4-week

-0.2 -0.1 0.0 0.1 0.2

-0.3

-0.2

-0.1

0.0

additive 1

addi

tive

2

cor = -0.7

8-week

Brassica effect correlations

Page 104: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 104

simulation: Bayesian vs composite IM

• 11 markers at 10cM spacing

• 2 QTL at 15cM, 55cM– effect size 1, variance 1, heritability 74%

• sample sizes n = 100, 200, 500– 100 independent trials

• comparison of methods– Bayesian interval mapping, 400,000 scans– composite interval mapping (QTL Cart)

Page 105: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 105

Bayesian vs. composite IM: 2 QTL

0 10 30 50

3040

5060

7080

(a) BIM, n=100

position, QTL 1

posi

tion,

QTL

2

0 10 30 50

3040

5060

7080

(b) BIM, n=200

position, QTL 1

posi

tion,

QTL

2

0 10 30 50

3040

5060

7080

(c) BIM, n=500

position, QTL 1

posi

tion,

QTL

2

0 10 30 50

3040

5060

7080

(d) CIM, n=100

position, QTL 1

posi

tion,

QTL

2

0 10 30 50

3040

5060

7080

(e) CIM, n=200

position, QTL 1

posi

tion,

QTL

2

0 10 30 50

3040

5060

7080

(f) CIM, n=500

position, QTL 1

posi

tion,

QTL

2

Page 106: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 106

ghost QTL detected with CIMtwo selected examples

0 20 40 60 80

05

10

15

20

25

30

(a) CIM, n=200

Position (cM)

LO

D

0 20 40 60 80

010

20

30

40

50

(b) CIM, n=500

Position (cM)

LO

D

Page 107: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 107

another 2-QTL simulation

• CIM vs. Bayesian QTL estimates– locus: 15, 65cM– effect: 1, 1

• sample sizes and heritabilities– n = 100, h2 = 30– n = 200, h2 = 25, 30, 40– 100 independent trials

• examine loci and effects

Page 108: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 108

2 QTL: loci estimates

0 10 20 30 40 50 60

1020

3040

50

QTL Cart Locus

Bay

esia

n Q

TL

Loc

us

locus 1: n = 100, h^2 = 30

20 40 60 80 100 12060

7080

9010

0QTL Cart Locus

Bay

esia

n Q

TL

Loc

us

locus 2: n = 100, h^2 = 30

Page 109: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 109

2 QTL: effect estimates

0.8 1.0 1.2 1.4 1.6 1.8

0.6

1.0

1.4

1.8

QTL Cart Effect

Bay

esia

n Q

TL

Eff

ect

locus 1: n = 100, h^2 = 30

-0.5 0.5 1.0 1.5 2.00.

00.

51.

01.

5QTL Cart Effect

Bay

esia

n Q

TL

Eff

ect

locus 2: n = 100, h^2 = 30

Page 110: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 110

2 QTL: loci and effects

10 30 50

1020

3040

QTL Cart Locus

Bay

esia

n Q

TL

Loc

us

locus 1: n = 200, h^2 = 40

50 60 70 80

5565

7585

QTL Cart Locus

Bay

esia

n Q

TL

Loc

us

locus 2: n = 200, h^2 = 40

0.8 1.2 1.6

0.8

1.0

1.2

1.4

QTL Cart Effect

Bay

esia

n Q

TL

Eff

ect

locus 1: n = 200, h^2 = 40

0.8 1.2 1.6

0.6

1.0

1.4

QTL Cart Effect

Bay

esia

n Q

TL

Eff

ect

locus 2: n = 200, h^2 = 40

Page 111: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 111

Part VI: How many QTLs?

• How many QTLs?– number of QTL is uncertain– estimate the number m

• What is the genetic architecture?– model architecture is uncertain

• number, gene action, interaction among QTL

– estimate the model M

• How does reversible jump MCMC work?– basic idea of Green(1995)– model selection in regression

Page 112: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 112

QTL Full Posterior

• posterior = likelihood * prior / constant

• posterior( paramaters | data )pr(loci, genos, effects, model | trait, map )

),,|(pr),,|(pr product ),,|(pr),,|(pr

)|(prsum)(prconstant constant

)(pr),,|(pr),,|(pr)|(pr)|(pr),|,,,(pr

MQYMXQMQYMXQ

MYY

MMQYMXQMMXYMQ

iii

M

Page 113: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 113

jumping the number of QTL

• model changes with number of QTL– analogous to stepwise regression if Q known

– use reversible jump MCMC to change number• book keeping to compare models

• change of variables between models

• what prior on number of QTL?– uniform over some range

– Poisson with prior mean

– exponential with prior mean

e

e

e

e

ee

e e e e e

0 2 4 6 8 100.

000.

100.

200.

30

p

p

p p

p

p

pp

p p p

u u u u u u u

u u u u

m = number of QTL

prio

r pr

obab

ility

epu

exponentialPoissonuniform

Page 114: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 114

Markov chain for number m

• add a new locus

• drop a locus

• update current model

0 1 mm-1 m+1...

m

Page 115: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 115

111111111111111111111111

1

11111

111111

11111

1111

111

1111

11111

11111

11111111

111111111

11

111111111

1111111

111

22222222222222222222222

2 22222222222 22

2222 22222

22222222

222222222 222222222

22 22

23333

333333333333 333

3

0 20 40 60 80 100

20

40

60

80

dist

ance

(cM

)

MCMC run

0 20 40 60 80 1000

1

2

3

num

ber

of Q

TL

jumping QTL number and loci

Page 116: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 116

RJ-MCMC full conditional updates

effects

locus

traitsY

genosQ

mapX

Mmodel

Page 117: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 117

reversible jump choicesaction step: draw one of three choices(m = number of QTL in model)• update step with probability 1-b(m+1)-d(m)

– update current model– loci, effects, genotypes as before

• add a locus with probability b(m+1)– propose a new locus– innovate effect and genotypes at new locus– decide whether to accept the “birth” of new locus

• drop a locus with probability d(m)– pick one of existing loci to drop– decide whether to accept the “death” of locus

Page 118: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 118

RJ-MCMC updates

effects

loci

traits

Y

genos

Q

addlocus

drop locus

b(m+1)

d(m)

1-b(m+1)-d(m)

map

X

Page 119: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 119

propose to drop a locus

• choose an existing locus– equal weight for all loci ?

– more weight to loci with small effects?

• “drop” effect & genotypes at old locus– adjust effects at other loci for collinearity

– this is reverse jump of Green (1995)

• check acceptance …– do not drop locus, effects & genotypes

– until move is accepted

1

1)1;(

mmrqd

1 2 m+13 …

Page 120: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 120

• propose a new locus– uniform chance over genome– actually need to be more careful (R van de Ven, pers. comm.)

• choose interval between loci already in model (include 0,L)– probability proportional to interval length (2–1)/L

• uniform chance within this interval 1/(2–1)

– need genotypes at locus & model effect

• innovate effect & genotypes at new locus– draw genotypes based on recombination (prior)

• no dependence on trait model yet

– draw effect as in Green’s reversible jump• adjust for collinearity: modify other parameters accordingly

• check acceptance ...

propose to add a locus

Lqb /1)(

0 L1 m+1 m2 …

Page 121: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 121

acceptance of reversible jump

• accept birth of new locus with probabilitymin(1,A)

• accept death of old locus with probabilitymin(1,1/A)

mQ

Jmrq

q

mb

md

XYm

XYmA

m

d

mb

m

m

,,,

1

)1;(

)(

)(

)1(

),|,(pr

),|1,(pr 11

Page 122: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 122

• move probabilities

• birth & death proposals

• Jacobian between models–fudge factor–see stepwise regression example

acceptance of reversible jump

nsJ

mrq

q

mb

md

othersr

d

mb

|

1

)1;(

)(

)(

)1(

m m+1

Page 123: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 123

RJ-MCMC: m = number of QTL

MCMC run0 200 400 600 800

01

23

45

6

MCMC run/10

num

ber

of Q

TL

0 200 400 600 800

01

23

45

6

MCMC run/1000 200 400 600 800

01

23

45

6

MCMC run/1000

num

ber

of Q

TL

0 200 400 600 800

01

23

45

6

Page 124: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 124

posterior for m for 8-week data

0 1 2 3 4 5 6

0.0

0.2

0.4

98% credible region for m: (1,3)based on 1 million steps

Poisson prior with mean 3

Page 125: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 125

Part VII: Reversible Jump Details

• reversible jump MCMC details– can update model with m QTL

– have basic idea of jumping models

– now: careful bookkeeping between models

• RJ-MCMC & Bayes factors– Bayes factors from RJ-MCMC chain– components of Bayes factors

Page 126: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 126

reversible jump idea• expand idea of MCMC to compare models

• adjust for parameters in different models– augment smaller model with innovations

– constraints on larger model

• calculus “change of variables” is key– add or drop parameter(s)

– carefully compute the Jacobian

• consider stepwise regression– Mallick (1995) & Green (1995)

– efficient calculation with Hausholder decomposition

Page 127: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 127

model selection in regression

• known regressors (e.g. markers)– models with 1 or 2 regressors

• jump between models– centering regressors simplifies calculations

iiii

iii

eQQaQQaYm

eQQaYm

)()(:2

)(:1

222111

11

Page 128: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 128

slope estimate for 1 regressor

recall least squares estimate of slope

note relation of slope to correlation

n

iiy

n

ii

y

n

iii

yyy

nYYsnQQs

ss

nYYQQr

s

sra

1

22

1

211

21

1

111

11

1

/)(,/)(

/))((,ˆ

Page 129: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 129

2 correlated regressors

slopes adjusted for other regressors

n

QQcQQs

s

srrra

s

srcc

s

sra

s

srrra

n

iii

yyy

yyyyy

1

2112122

212

2

11222

1

2122121

2

2

1

21211

))((,

)(ˆ

,ˆ)(

ˆ

Page 130: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 130

Gibbs Sampler for Model 1

• mean

• slope

• variance

nv

QQaYYvnv

nsB

ns

YYQQBa

n

nB

nBYB

n

iii

n

n

iii

n

nnn

1

2

112

22

21

2

21

111

2

)(,-inv~

,))((

~

,),(~

Page 131: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 131

Gibbs Sampler for Model 2

• mean

• slopes

• variance

nv

QQaYYv

nv

nsB

ns

QQaYYQQBa

nBYB

n

i kkikki

n

n

iiii

n

nn

1

22

1

2

22

212

2

212

111122

2

2

)(

,-inv~

,))()((

~

),(~

Page 132: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 132

updates from 2->1

• drop 2nd regressor

• adjust other regressor

02

2121

a

caaa

Page 133: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 133

updates from 1->2• add 2nd slope, adjusting for collinearity

• adjust other slope & variance

212212121

212

111122

222

12

ˆ

)(ˆˆ)(ˆ,ˆ

),1,0(~

caJczacaaa

ns

QQaYQQaJzaa

nsJz

n

iiii

Page 134: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 134

model selection in regression• known regressors (e.g. markers)

– models with 1 or 2 regressors

• jump between models– augment with new innovation z

0),,,(12

ˆ,1)0(~);,,(21

tionstransformasinnovationparameters

2121221

2121

222

z

caaaaa

caaa

Jzaazza

m

Page 135: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 135

change of variables• change variables from model 1 to model 2• calculus issues for integration

– need to formally account for change of variables– infinitessimal steps in integration (db)– involves partial derivatives (next page)

JdadzQQYzadadaQQYaa

QQYzag

a

z

a

J

cJc

a

a

),,|;(),,|,(

),,|;(

ˆ10

1

21212121

21

2

2121

2

1

Page 136: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 136

Jdadzdadaza

zagdada

JJcJJ

Jc

J

Jc

za

zagaazag

21

2

21

2121

2121

);,,(det

)(010

1det

0

1);(),,();(

Jacobian & the calculus

• Jacobian sorts out change of variables– careful: easy to mess up here!

Page 137: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 137

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

b1

b2

c21 = 0.7

Move Between Models

m=1

m=2

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

b1

b2

Reversible Jump Sequence

geometry of reversible jump

a1 a1

a 2a 2

Page 138: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 138

0.05 0.10 0.15

0.0

0.05

0.10

0.15

b1

b2

a short sequence

-0.3 -0.1 0.1

0.0

0.1

0.2

0.3

0.4

first 1000 with m<3

b1

b2

-0.2 0.0 0.2

QT additive reversible jump

a1 a1

a 2a 2

Page 139: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 139

credible set for additive

-0.1 0.0 0.1 0.2

-0.1

0.0

0.1

0.2

0.3

b1

b2

90% & 95% setsbased on normal

regression linecorresponds toslope of updates

a1

a 2

Page 140: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 140

multivariate updating of effects

• more computations when m > 2• avoid matrix inverse

– Cholesky decomposition of matrix

• simultaneous updates– effects at all loci

• accept new locus based on– sampled new genos at locus

– sampled new effects at all loci

• also long-range positions updates

before

after

Page 141: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 141

Part VII: Model Assessment

• posterior histograms – number of QTL

• effect of prior on posterior

– genetic architecture across genome

– model averaging: loci and effects

• Bayes factors– What are Bayes factors?

– Bayes factors from RJ-MCMC chain

– components of Bayes factors

Page 142: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 142

model selection and assessment• how many QTL are supported by data?

– parsimony: balance model fit to model complexity

• posterior as tool for model insight– number and chromosome distribution of QTL– loci and effects across genome– heritability, variance and LOD by number of QTL

• Bayes factors– advantages

• related to likelihood ratio test• modest sensitivity to prior• intuitive interpretation

– disadvantages/criticisms• do not work for improper priors• theoretical problems (Gelfand Day 1994; Draper 1995)

Page 143: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 143

0 20 40 60 80

0.0

0.01

0.02

0.03

0.04

distance (cM)

8-week vernalizationraw histogram of loci

Page 144: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 144

20 40 60 80

0.0

0.02

0.05

distance (cM)

m =

1

0 20 40 60 80

0.0

0.02

0.04

distance (cM)

m =

2

0 20 40 60 80

0.0

0.02

0.04

distance (cM)

m =

3

0 20 40 60 80

0.0

0.02

0.04

distance (cM)

m >

3

conditional histograms

Page 145: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 145

8 QTL simulation (Stevens Fisch 1998)• n = 200, h2 = .5

– SF detected 3 QTL

• Bayesian IM

n h2 detect

200 .5 2

200 .8 4

500 .9 7

500 .97 8

Location, j

QTL

No.

j

Chrom. cj

Marker mj

Position (cM)

Additive effect

j

Dominance

Effect

j

1 1 1 11 -3 0

2 1 3 10 -5 0

3 3 4 2 2 0

4 6 6 7 -3 0

5 6 8 12 3 0

6 8 2 12 -4 0

7 8 3 14 1 0

8 9 10 15 2 0

Page 146: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 146

posterior number of QTL

7 8 9 10 11 12 13

0.0

0.1

0.2

0.3

0.4

number of QTL

freq

uenc

y

geometric prior with mean 0.5seems to have no influence on posterior here

Page 147: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 147

posterior genetic architecture  Chromosome count vector

m 1 2 3 4 5 6 7 8 9 10 Count

8 2 0 1 0 0 2 0 2 1 0 3371

9 3 0 1 0 0 2 0 2 1 0 751

7 2 0 1 0 0 2 0 1 1 0 377

9 2 0 1 0 0 3 0 2 1 0 218

9 2 0 1 0 0 2 0 2 2 0 198

Page 148: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 148

model averaging for 8 QTL(a)

scal

edQ

TL in

tens

ity0 500 1000 1500 2000

010

0020

00

0 500 1000 1500 2000

-6-2

02

4

(b)

Position

addi

tive

effe

ct

0 500 1000 1500 2000

-10

12

(c)

Position

dom

inan

ce e

ffect

Page 149: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 149

(a)

position(cM)

scal

edQ

TL

inte

nsity

40 60 80 100 120

050

150

(b)

position(cM)

Fre

quen

cy

40 60 80 100 120

020

60

model averaging: focus on chr. 8all samples

samples with 8 QTL

Page 150: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 150

(a)

scal

edQ

TL in

tens

ity

0 200 400 600 800 10000

1000

2000

0 200 400 600 800 1000

-6-2

02

4

(b)

Position

addi

tive

effe

ct

0 200 400 600 800 1000

-10

12

(c)

Position

dom

inan

ce e

ffect

focus on key chromosomes

Page 151: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 151

B. napus 8-week vernalizationwhole genome study

chrom position LOD effect

n2 66.4 25.87 21.3

n3 106.8 13.33 12.95

n10 43.3 13.14 12.77

n2 154.0 10.69 11.3

n13 126.7 32.4 -5.78

Table 8.5: Result of CIM analysis for B. napus dataset.

chrom position effect

n10 45.0 9.24

n2 66.9 22.4

n2 142.6 9.01

n3 103.4 8.36

Table 8.6: Estimates of QTL location and effect using BIM.

Page 152: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 152

(a)

scal

edQ

TL

inte

nsity

0 500 1000 1500 2000

050

015

00

0 500 1000 1500 2000

-10

1030

(b)

Position

addi

tive

effe

ct

8-week vern: model averaging

Page 153: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 153

(a) QTL 1 (n2)

Position

Fre

quen

cy

75 80 85 90 95 105

010

2030

4050

(b) QTL 2 (n3)

Position

Fre

quen

cy

0 20 60 100

050

100

150

(c) QTL 3 (n10)

Position

Fre

quen

cy

20 30 40 50 60

020

4060

80

B. rapa: effect of flc markerKole et al. (1997)

(a)

scal

edQ

TL

inte

nsity

0 200 400 600 800

020

040

0

0 200 400 600 800

-4-2

02

(b)

Position

addi

tive

effe

ct no flcmarker

Page 154: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 154

B. rapa with added flc marker(a)

scal

edQ

TL

inte

nsity

0 200 400 600 800

040

010

00

0 200 400 600 800

-4-2

02

(b)

Position

addi

tive

effe

ct

(a) QTL 1 (n2)

Position

Fre

quen

cy

75 80 85 90 95 100

020

4060

80

(b) QTL 2 (n3)

Position

Fre

quen

cy

0 20 60 100

020

4060

(c) QTL 3 (n3)

Position

Fre

quen

cy

0 20 60 100

020

4060

8010

0

(d) QTL 4 (n10)

Position

Fre

quen

cy

20 30 40 50 60

020

6010

0

Page 155: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 155

How sensitive is posteriorto choice of prior?

• simulations with 0, 1 or 2 QTL– strong effects (additive = 2, variance = 1)– linked loci 36cM apart

• differences with number of QTL– clear differences by actual number– works well with 100,000, better with 1M

• effect of Poisson prior mean– larger prior mean shifts posterior up– but prior does not take over

Page 156: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 156

simulation study: prior

• 2 QTL at 15, 65cM

• n = 100, 200; h2 = 40%

• vary prior mean from 1 to 10 QTL– Poisson prior

• 10 independent simulations

• examine posterior mean, probability

Page 157: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 157

2 4 6 8 10

1.5

2.0

2.5

prior mean no. QTL

post

erio

r mea

n no

. QT

L

2 4 6 8 10

1.6

2.0

2.4

2.8

prior mean no. QTL

post

erio

r mea

n no

. QT

L

2 4 6 8 10

0.0

0.4

0.8

prior mean no. QTL

post

erio

r pro

b. o

f 2 Q

TL

n = 100, h^2 = 40

2 4 6 8 10

0.0

0.4

0.8

prior mean no. QTL

post

erio

r pro

b. o

f 2 Q

TL

n = 200, h^2 = 40

posterior m depends on prior

Page 158: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 158

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

prior mean

cum

ulat

ive

prob

.

n = 100, h^2 = 40

1

1

1 1 1

2

2

2 2 2

3

3

33 3

4

4

44 4

5

5

5

5 5

6

6

6

6 6

7

7

7

7 7

8

8

8

8 8

9

9

9

99

10

10

10

1010

0 1 2 3 40.

00.

20.

40.

60.

81.

0

prior mean

cum

ulat

ive

prob

.

n = 200, h^2 = 40

11

11 1

2 2

22 2

3 3

3

3 3

4 4

4

4 4

5 5

5

5 5

6 6

6

6 6

7 7

7

7 7

8 8

8

88

9 9

9

99

10 10

10

1010

cumulative posterioras prior mean changes

Page 159: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 159

0 1 2 3 4 5

0.0

0.4

0.8

prio

r m

ean

= 2

priorpost.

0 QTL present0 1 2 3 4 5

0.0

0.4

1 QTL present0 1 2 3 4 5

0.0

0.4

0.8

2 QTL present

0 1 2 3 4 5

0.0

0.2

0.4

prio

r m

ean

= 4

0 QTL present0 1 2 3 4 5

0.0

0.2

0.4

0.6

1 QTL present0 1 2 3 4 5

0.0

0.2

0.4

2 QTL present

effect of prior mean on posterior m

Page 160: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 160

prior effect on posterior (a) Poisson(1)

nQtl

Den

sity

0 2 4 6 8 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(b) Poisson(3)

nQtl

Den

sity

0 2 4 6 8 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(c) Poisson(6)

nQtl

Den

sity

0 5 10 20 30

0.00

0.05

0.10

0.15

0.20

(d) Geom(2/3)

nQtl

Den

sity

0 2 4 6 8 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(e) S&A(4)

nQtl

Den

sity

0 2 4 6 8 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(f) uniform(0,30)

nQtl

Den

sity

0 5 10 20 30

0.00

0.05

0.10

0.15

0.20

Page 161: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 161

Bayes factorsWhich model (1 or 2 or 3 QTLs?) has higher probability of supporting the data?

– ratio of posterior odds to prior odds

– ratio of model likelihoods

)model|(pr

)model|(pr

)model(pr/)model(pr

)|model(pr/)|model(pr

2

1

21

2112

Y

YYYB

BF(1:2) 2log(BF) evidence for 1st

< 1 < 0 negative

1 to 3 0 to 2 negligible

3 to 12 2 to 5 positive

12 to 150 5 to 10 strong

> 150 > 10 very strong

Page 162: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 162

computing marginal means

• very difficult based on separate model runs– run MCMC for model k– average pr(Y|k) across model parameters k

• arithmetic mean– can be inefficient if prior differs from posterior

• weighted harmonic mean– more efficient but less stable

• stabilized harmonic mean (SHM)– average over “nuisance parameters” (e.g. variance)– more work, but estimate is more stable (Satagopan et al. 2000)

• easy when model itself is a parameter– RJ-MCMC: marginal summaries of number of QTL– posterior/prior provides Bayes factor yardstick

kkkkkk dYY )model(pr)model,|(pr)model|(pr |

Page 163: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 163

estimating Bayes factors(a)

log BF(1,2)D

ensi

ty

-2.20 -2.05 -1.90

02

46

810

(b)

log BF(2,3)

Den

sity

-0.40 -0.25 -0.10

05

1015

(c)

log BF(3,4)

Den

sity

-0.20 -0.05 0.05

02

46

810

(d)

log BF(1,2)

Den

sity

-6 -4 -2 0 2

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(e)

log BF(2,3)

Den

sity

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

(f)

log BF(3,4)

Den

sity

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

combined RJ-MCMCposterior freq/prior

separate MCMCs stabilizedharmonic mean

8-wk vern data100 MCMC repeats(note different scales)

Page 164: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 164

Bayes factors & likelihood ratio

• equivalent to LR statistic when– comparing two nested models– simple hypotheses (e.g. 1 vs 2 QTL)

• Bayes Information Criteria (BIC) in general– Schwartz introduced for model selection– penalty for different number of parameters p

)log()()log(2)log(2 1212 nppLRB

Page 165: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 165

Bayesian LOD• Bayesian “LOD” computed at each step

– based on LR given sampled genotypes and effects– can be larger or smaller than profile LOD– informal diagnostic of fit– geometric marginal means for pr(Y|model)– log posterior odds (LPD; see Sen Churchill 2000)

n

i i

ii

n

i i

Qii

Y

QYeBLOD

Y

XQQY

eLOD

110

110

)~

|(pr

),|(prln)(log

)~

|(pr

),|(pr)ˆ;|(pr

ln)(log)(

Page 166: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 166

compare LODs• 200 simulations (only 100 for some)• n = 100, 200; h2 = 25, 30, 40%• 2 QTL at 15, 65cM• Bayesian vs. CIM-based LODs

– Bayesian for simultaneous fit

– classical for sum of CIM LODs at peaks

• plot symbol is number of CIM peaks

Page 167: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 167

comparing LODs

0 4 8 12

04

812

QTL Cart sum of LODs

RJ-

MC

MC

mea

n L

OD

22

0 1

12

21 222

1 2

2111 2

2

0

212

2112222

1 221 2

222

2

2

1

2

2

21

2

1221

2

21

112

211

2

21 2

23

1

211 2

111

1

2

2

3

2

2

1 2

111

212

1

11

1

2

20

21

1

22221

1 21

1

1

11 11

21 32

2

1

2

22

2

2

2

221

2

12

2

2

22

2

11

2

3

2

3122

1

122

11

1

1

1

1

2121 21

1

1

2

22

2

22

2

22

22

2

22

1

12

22

21

00

112

11

2111

11221

1

2

n = 100, h^2 = 25

2 6 10

05

1015

QTL Cart sum of LODs

RJ-

MC

MC

mea

n L

OD

2212

22

211

22

22121

2

11

31

222222

3

2

22 2 321 2

222 22

2222

222

111

22

21

2

112

2 211 221

1

22

1

2

22

222

221 222

2

1

12

222

2 22222

222

1

2

22

23

21

212 2

2

21 22

21

23

22

2

2

22

2123

2

221

2

12222

21

2

1

22

1

1

2

22

2

22

2 3

222 2

3

2

221

1

22

2

1

21

21 2

12 2

1

222

222

2

1

2111

2222

2

1

22

2

2

n = 100, h^2 = 30

5 10 20

510

15

QTL Cart sum of LODs

RJ-

MC

MC

mea

n L

OD

2

2

3 3

2 22

3

4

222

2

32223

243

3

3322

33

222

2 3

2221

2

33

323 3

3

322

3 3

222

3

1

3

2

33

3

22

3

22

223322

2

33

333

3

2

2 22

233

23

23

2223

32 2

32332

2

2

4

2

2

2 2

33213 3

3221 33

33312

3 3

2

4

2 2

2

13

3

1

132

22 3

3

2

331 2

333 333

2

3

2

2

2

222 2

32

2222

3

3

22

24

23

33 32

32

2

32 33

3

3

232

2

222 33

3

n = 100, h^2 = 40

4 8 12

510

20

QTL Cart sum of LODs

RJ-

MC

MC

mea

n L

OD

221

2

2

2222

2

21

21 222

2

22 2

222 2

22222222

211222

2

2

3222

11 222

22

2222

1

2

2

1

2

2

2

2

2

22

2

22

2

222

12 2

2

1

22

2

2

12

2

2222 2

22

2

2 2

22

2

222

222

1

221

2

1

2

2

2

1122

21

22

2

2

2

22

221

2222

1 2

22

1

2

1

2 2222

23 22

22

22 2

2

22

22 3

22

2

22

23

2

2

2

2 2

2

22

2

2

12

2

2

2

22

2

1

2

22

31

2 222

222

2

n = 200, h^2 = 25

5 10 20

515

25

QTL Cart sum of LODs

RJ-

MC

MC

mea

n L

OD

2

222

2

2

222

22

2

2

22

2

22 222

2

222

222 2

2

2

2

222

22

2 32 2222

22

2

22 22

22

2

2

2

22222

2

2 222

322

2222

2 3

2

22

2

21 2

22 222

23

2

2

2

22

2

22

322

n = 200, h^2 = 30

10 20 30 40

1525

QTL Cart sum of LODs

RJ-

MC

MC

mea

n L

OD

22222

22

232

22 322

22 2

2

2

2

322

22222322

2 2 4

2 3

2 2

22

3

22

22

2 22

222

2

2

2 2

2

22

23

22 32 2

2

222

2

22

2

2 2

2

22 2

3 22 2

32 322

22

3222

2222

3

n = 200, h^2 = 40

Page 168: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 168

prior sensitivity of Bayes factor• adjust effects prior as m grows

– otherwise prior dominates for large models

• BF sensitive to fixed prior for effects– use hyperprior to soften effect

?)(?,Beta~),/2,0(N~ 2

1

msa

eQaY

r

i

m

rriri

Page 169: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 169

BF sensitivity to fixed prior for effects

Bayes factors for 8-week vernalization data

0.05 0.20 0.50 2.00 5.00 20.00

12

34

1

1 11

11

1

1

1

1

122

2 2 2 22

22

2

2

3 33 3 3 3

3 33

33

4 4 4 4 4 4 4

4

4 4

4

0.2

0.5

hyper-parameter

Bay

es f

acto

rs

1234

B12B23B34B45

Page 170: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 170

BF insensitivity to random effects prior

?)(?,Beta~),/2,0(N~ 2 msar

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

6

0.25,9.750.5,9.51,92,101,31,1

hyper-parameter

dens

ity

density for hyper-priors

0.1 0.2 0.3 0.50.

20.

40.

61.

0

1 1 1 1 11

2 2 2 2 2 2

33 3 3 3

3

EB

ayes

fac

tors

123

B12B23B34

insensitivity to hyper-prior

0.025 0.05 0.5

Page 171: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 171

BF approximately invariantto form of prior

Prior, pr(m) B12 B23 B34 B45

Geometric(2/3) 0.129 0.773 0.954 1.019

Poisson(1) 0.128 0.775 0.941 1.013

Poisson(3) 0.130 0.766 0.954 1.003

Poisson(6) 0.132 0.775 0.963 1.009

Fast-decay poisson(1)

0.128 0.764 0.941 1.022

Fast-decay Poisson(4)

0.129 0.773 9.963 1.032

Uniform 0.133 0.774 0.960 0.99

Page 172: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 172

power study of Bayes factor• large B01 favors model with 1 QTL

• n = 100, 200, 500

• size of genome– c = 1, 5, 10 number of chromosomes

• environmental variance– V = 1, 4, 9 with effect of size 1

• 11 markers per chromosome, 10cM apart

• 100 independent trials

Page 173: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 173

(a) n = 100 c= 1

Bayes factor

Den

sity

0 1 2 3 4 50.

00.

20.

4

(b) n = 100 c= 5

Bayes factor

Den

sity

0 1 2 3 4

0.0

0.2

0.4

(c) n = 100 c= 10

Bayes factor

Den

sity

0 1 2 3 4

0.0

0.2

0.4

(d) n = 200 c= 1

Bayes factor

Den

sity

0 2 4 6 8

0.00

0.15

(e) n = 200 c= 5

Bayes factor

Den

sity

0 1 2 3 4 5 60.

00.

2

(f) n = 200 c= 10

Bayes factor

Den

sity

0 1 2 3 4 5

0.0

0.2

0.4

(g) n = 500 c= 1

Bayes factor

Den

sity

0 2 4 6 8 10

0.00

0.10

(h) n = 500 c= 5

Bayes factor

Den

sity

0 2 4 6 8 10

0.00

0.10

0.20

(i) n = 500 c= 10

Bayes factor

Den

sity

0 2 4 6 80.

000.

100.

20

B01 “power” vs. n & genome size c

Page 174: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 174

B12 “size of test” vs. n & variance v(a) n = 100, v = 16

Bayes factor

Den

sity

0.5 1.5

0.0

1.0

(b) n = 100, v = 9

Bayes factor

Den

sity

0.0 1.0

0.0

1.0

2.0

(c) n = 100, v = 4

Bayes factor

Den

sity

0.0 1.0

0.0

0.6

1.2

(d) n = 100, v = 1

Bayes factor

Den

sity

0.0 1.5 3.0

0.0

0.3

(e) n = 200, v = 16

Bayes factor

Den

sity

0.2 0.8 1.4

0.0

1.0

(f) n = 200, v = 9

Bayes factor

Den

sity

0.0 1.0

0.0

0.6

1.2

(g) n = 200, v = 4

Bayes factor

Den

sity

0.0 1.0 2.0

0.0

1.0

(h) n = 200, v = 1

Bayes factor

Den

sity

0 2 4

0.0

0.2

0.4

(i) n = 500, v = 16

Bayes factor

Den

sity

0.0 1.5

0.0

0.4

0.8

(j) n = 500, v = 9

Bayes factor

Den

sity

0.5 1.5

0.0

0.6

(k) n = 500, v = 4

Bayes factor

Den

sity

0.0 1.5 3.0

0.0

0.3

0.6

(l) n = 500, v = 1

Bayes factor

Den

sity

0 2 4

0.0

0.2

0.4

Page 175: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 175

Part IX: Software & References• MCMC software

– general

– specific to QTL problems

• References– MCMC, reversible jump

– Bayesian approach

– Bayesian QTL analysis

Page 176: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 176

RJ-MCMC software• General MCMC software

– U Bristol links

• www.stats.bris.ac.uk/MCMC/pages/links.html– BUGS (Bayesian inference Using Gibbs Sampling)

• www.mrc-bsu.cam.ac.uk/bugs/

• MCMC software for QTLs– Bmapqtl (Satagopan Yandell 1996; Gaffney 2001)

• www.stat.wisc.edu/~yandell/qtl/software/Bmapqtl

– Bayesian QTL / Multimapper (Sillanpää Arjas 1998)• www.rni.helsinki.fi/~mjs

– Yi, Xu ([email protected])

– Stephens, Fisch (email)

Page 177: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 177

Bmapqtl: our RJ-MCMC software• www.stat.wisc.edu/~yandell/qtl/software/Bmapqtl

– module using QtlCart format– compiled in C for Windows/NT– extensions in progress– R post-processing graphics (Yandell)

• Bayes factor and reversible jump MCMC computation• enhances MCMCQTL and revjump software

– initially designed by JM Satagopan (1996)– major revision and extension by PJ Gaffney (2001)

• whole genome• multivariate update of effects• long range position updates• substantial improvements in speed, efficiency• pre-burnin: initial prior number of QTL very large

Page 178: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 178

R library(Bmapqtl)> library(Bmapqtl)> Bmapqtl.plot( vern8.exp, "exp", nqtl = 2 )posterior for number of QTL 1 2 3 4 5 6 7 0.21 0.49 0.21 0.07 0.02 0.00 0.00 Loading required package: modreg 2 estimated loci: 42.22 80.64 marginal heritability 0.356 conditional heritability 1 2 3 4 5 6 7 0.325 0.360 0.373 0.363 0.361 0.368 0.357 marginal LOD 9.726 conditional LOD 1 2 3 4 5 6 7 8.576 9.933 10.193 10.079 10.233 10.823 9.489

Page 179: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 179

Bmapqtl.nqtl( data, prior )posterior

number of QTL

Den

sity

1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

0.8

1.0

0 1 2 3 4 5 6 7

12

34

5

number of QTL

post

erio

r /

prio

r

weak

moderate

strong

Bayes Factor ratio

Page 180: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 180

Bmapqtl.loci( data, nqtl )marginal summaries

scal

edQ

TL

inte

nsity

0 20 40 60 80

050

150

1

0 20 40 60 80

-0.2

0.0

Position

addi

tive

effe

ct

1

conditional on m >= 2

scal

edQ

TL

inte

nsity

0 20 40 60 80

050

150

1

0 20 40 60 80

-0.2

0.0

Position

addi

tive

effe

ct

1

Page 181: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 181

Bmapqtl.hist( data, histograms, mains )

0.1 0.2 0.3 0.4 0.5 0.6

01

23

45

marginal heritability

N = 1050 Bandwidth = 0.01679

Den

sity

1 2 3 4 5 6 7

0.1

0.3

0.5

number of QTL

herit

abili

ty

conditional heritability

6 8 10 12 14

0.00

0.15

0.30

marginal LOD

N = 1050 Bandwidth = 0.3047

Den

sity

1 2 3 4 5 6 7

68

1014

number of QTL

LOD

conditional LOD

Page 182: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 182

the art of MCMC• convergence issues

– burn-in period & when to stop

• proper mixing of the chain– smart proposals & smart updates

• frequentist approach– simulated annealing: reaching the peak– simulated tempering: heating & cooling the chain

• Bayesian approach– influence of priors on posterior– Rao-Blackwell smoothing

• bump-hunting for mixtures (e.g. QTL)– model averaging

Page 183: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 183

Bayes factor references

• MA Newton, AE Raftery (1994) Approximate Bayesian inference with the weighted likelihood bootstrap, J Royal Statist Soc B 56: 3-48.

• RE Kass, AE Raftery (1995) Bayes factors, J Amer Statist Assoc 90: 773-795.

• JM Satagopan, MA Newton, AE Raftery (2000) Easy estimation of normalizing constants and Bayes factors from posterior simulation: Stabilizing the harmonic mean estimator. Technical Report 1028, Department of Statistics, University of Wisconsin.

Page 184: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 184

reversible jumpMCMC references

• PJ Green (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrika 82: 711-732.

• L Kuo & B Mallick (1998) Variable selection for regression models, Sankhya B 60: 65-81.

• BK Mallick & AE Gelfand (1994) Generalized linear models with unknown number of components, Biometrika 81: 237-245.

• S Richardson & PJ Green (1997) On Bayesian analysis of mixture with an unknown of components, J Royal Statist Soc B 59: 731-792.

Page 185: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 185

QTL reversible jumpMCMC: inbred lines

• Gaffney PJ (2001) An efficient reversible jump Markov chain Monte Carlo approach to detect multiple loci and their effects in inbred crosses. PhD dissertation, Department of Statistics, UW-Madison.

• JM Satagopan, BS Yandell (1996) Estimating the number of quantitative trait loci via Bayesian model determination, Proc JSM Biometrics Section.

• DA Stephens, RD Fisch (1998) Bayesian analysis of quantitative trait locus data using reversible jump Markov chain Monte Carlo, Biometrics 54: 1334-1347.

• MJ Sillanpää, E Arjas (1998) Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data, Genetics 148: 1373-1388.

• R Waagepetersen, D Sorensen (1999) Understanding reversible jump MCMC, mailto:[email protected].

• N Yi, S Xu (2002) Mapping quantitative trait loci with epistatic effects. Genet. Res. Camb. 00: 000-000.

Page 186: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 186

Bayes & MCMC references

• CJ Geyer (1992) Practical Markov chain Monte Carlo, Statistical Science 7: 473-511

• L Tierney (1994) Markov Chains for exploring posterior distributions, The Annals of Statistics 22: 1701-1728 (with disc:1728-1762).

• A Gelman, J Carlin, H Stern & D Rubin (1995) Bayesian Data Analysis, CRC/Chapman & Hall.

• BP Carlin & TA Louis (1996) Bayes and Empirical Bayes Methods for Data Analysis, CRC/Chapman & Hall.

• WR Gilks, S Richardson, & DJ Spiegelhalter (Ed 1996) Markov Chain Monte Carlo in Practice, CRC/Chapman & Hall.

• MH Chen, QM Shao, JG Ibrahim (200?) Monte Carlo Methods in Bayesian Computation, Springer Series in Statistics.

Page 187: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 187

QTL references• D Thomas & V Cortessis (1992) A Gibbs sampling approach to

linkage analysis, Hum. Hered. 42: 63-76.

• I Hoeschele & P vanRanden (1993) Bayesian analysis of linkage between genetic markers and quantitative trait loci. I. Prior knowledge, Theor. Appl. Genet. 85:953-960.

• I Hoeschele & P vanRanden (1993) Bayesian analysis of linkage between genetic markers and quantitative trait loci. II. Combining prior knowledge with experimental evidence, Theor. Appl. Genet. 85:946-952.

• SW Guo & EA Thompson (1994) Monte Carlo estimation of mixed models for large complex pedigrees, Biometrics 50: 417-432.

• JM Satagopan, BS Yandell, MA Newton & TC Osborn (1996) A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo, Genetics 144: 805-816.

Page 188: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 188

Part X: Combining Multiple Crosses• combining inbred lines in search of QTL

– most IM methods limited to single cross– animal model largely focuses on polygenes– individuals no longer independent given Q

• recent work in plant sciences– Bernardo (1994) Wright’s relationship matrix A– Rebai et al. (1994) regression method– Xu, Atchley (1995) IBD & A for QTLs & polygenes– Liu, Zeng (2000) multiple inbred lines, fixed effect IM– Zou, Yandell, Fine (2001 Genetics) power, threshold– Yi, Xu (2002) Genetica

Page 189: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 189

thresholds for multiple crosses• permutation test

– Churchill Doerge (1994); Doerge Churchill (1996)– computationally intensive– difficult to compare different designs

• theoretical approximation– Lander Botstein (1989) Dupuis Siegmund (1999)

• single cross, dense linkage map

– Rebai et al. (1994, 1995) approximate extension• Piepho (2001) improved calculation of efficiency

– Zou, Yandell, Fine (2001) extend original theory

Page 190: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 190

extension of threshold theory• likelihood for multiple crosses of inbred

lines with m founders– approximately 2 with m degrees of freedom– genome-wide threshold theory

• extends naturally based on Ornstein-Uhlenbeck

– threshold based on dense or sparse linkage map

• some calculations based on BC1, F2, BC2– Liu Zeng (2000) ECM method to estimate

Yj ~ Normal(GQj, j2 ), j = cross

Page 191: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 191

literature on outbred studies• Interval Mapping for Outbred Populations

– Haley, Knott & Elsen (1994) Genetics

– Thomas & Cortessis (1992) Hum. Hered.

– Hoeschele & vanRanden (1993ab) Theor. Appl. Genet. (etc.)

– Guo & Thompson (1994) Biometrics

• Experimental Outbred Crosses (BC, F2, RI)– collapse markers from 4 to 2 alleles

• Multiple Cross Pedigrees– polygenic effects not modeled here

– related individuals are correlated (via coancestry) – Liu & Zeng (2000) Genetics

– Zou, Yandell & Fine (2001) Genetics

Page 192: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 192

QTL reversible jumpMCMC: pedigrees

• S Heath (1997) Markov chain Monte Carlo segregation and linkage analysis for oligenic models, Am J Hum Genet 61: 748-760.

• I Hoeschele, P Uimari , FE Grignola, Q Zhang & KM Gage (1997) Advances in statistical methods to map quantitative trait loci in outbred populations, Genetics 147:1445-1457.

• P Uimari and I Hoeschele (1997) Mapping linked quantitative trait loci using Bayesian analysis and Markov chain Monte Carlo algorithms, Genetics 146: 735-743.

• MJ Sillanpaa & E Arjas (1999) Bayesian mapping of multiple quantitative trait loci from incomplete outbred offspring data, Genetics 151, 1605-1619.

Page 193: June 2002NCSU QTL II © Brian S. Yandell1 Efficient and Robust Model Selection for Quantitative Trait Loci Analysis in Inbred Lines Brian S. Yandell University

June 2002 NCSU QTL II © Brian S. Yandell 193

many thanksMichael Newton

Daniel Sorensen

Daniel Gianola

Jaya Satagopan

Patrick Gaffney

Fei Zou

Liang Li

Yang Song

Chunfang Jin

Tom Osborn

David Butruille

Marcio Ferrera

Josh Udahl

Pablo Quijada

Alan Attie

Jonathan Stoehr

Hong Lan

USDA Hatch Grants