taking the broad view of model selection for qtl in...

26
January 2003 PAG XI © Brian S. Yandell 1 Taking the Broad View of Model Selection for QTL in Experimental Crosses Brian S. Yandell University of Wisconsin-Madison www.stat.wisc.edu/~yandell/statgen with Chunfang “Amy” Jin, UW-Madison, Patrick J. Gaffney, Lubrizol, and Jaya M. Satagopan, Sloan-Kettering Plant & Animal Genome XI, January 2003

Upload: others

Post on 21-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 1

Taking the Broad Viewof Model Selection for QTL

in Experimental Crosses

Brian S. YandellUniversity of Wisconsin-Madison

www.stat.wisc.edu/~yandell/statgenwith Chunfang “Amy” Jin, UW-Madison,

Patrick J. Gaffney, Lubrizol,and Jaya M. Satagopan, Sloan-Kettering

Plant & Animal Genome XI, January 2003

Page 2: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 2

0 5 10 15 20 25 30

01

23

rank order of QTL

addi

tive

effe

ct

Pareto diagram of QTL effects

54

3

2

1

major QTL onlinkage map

majorQTL

minorQTL

polygenes

(modifiers)

Page 3: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 3

how many (detectable) QTL?• build m = number of QTL detected into model

– directly allow uncertainty in genetic architecture– model selection over number of QTL, architecture– use Bayes factors and model averaging

• to identify “better” models• many, many QTL may affect most any trait

– how many QTL are detectable with these data?• limits to useful detection (Bernardo 2000)• depends on sample size, heritability, environmental variation

– consider probability that a QTL is in the model• avoid sharp in/out dichotomy• major QTL usually selected, minor QTL sampled infrequently

Page 4: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 4

interval mapping basics• observed measurements

– Y = phenotypic trait– X = markers & linkage map

• i = individual index 1,…,n• missing data

– missing marker data– Q = QT genotypes

• alleles QQ, Qq, or qq at locus• unknown genetic architecture

– λ = QT locus (or loci)– θ = genetic action– m = number of QTL

• pr(Q|X,λ,m) recombination model– grounded by linkage map, experimental cross– recombination yields multinomial for Q given X

• pr(Y|Q,θ,m) phenotype model– distribution shape (assumed normal here) – unknown parameters θ (could be non-parametric)

observed X Y

missing Q

unknown λ θafter

Sen Churchill (2001)

Page 5: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 5

Classical vs. Bayesian IM• MIM: classical LOD: mix over genotypes Q

– L(λ,θ|Y,m) = pr(Y|X,λ,θ,m)= producti [sumQ pr(Q|Xi,λ,m) pr(Yi|Q,θ,m)]

– maximize LOD(λ) = 2.3log(LR(λ)) = maxθ log10 L(λ,θ|Y,m)/L(µ|Y)– threshold for testing presence of QTL– Kao Zeng Teasdale 1999; Zeng et al. 2000; Broman Speed 2002

• BIM: Bayesian posterior: Q as missing data– sample genotypes Q, loci λ, effects θ and number of QTL m

• pr(λ,Q,θ,m|Y,X) = [producti pr(Qi|Xi,λ,m) pr(Yi|Qi,θ,m)] pr(λ,θ|X,m)pr(m)– study marginal posteriors

• pr(λ,θ|Y,X,m) = sumQ pr(λ,Q,θ|Y,X,m) with m fixed• pr(m|Y,X) = sum(λ,θ) pr(λ,θ|Y,X,m)pr(m)

– threshold for posterior “power” (positive false discovery rate)– Satagopan et al. 1996; Gaffney 2001; Yi Xu 2002

Page 6: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 6

Model Selection for QTL• what is the genetic architecture?

– M = model = (λ,θ,m)– λ = QT locus (or loci)– θ = genetic action (additive, dominance, epistasis)– m = number of QTL

• how to assess models?– MIM: various flavors of AIC, BIC– BIM: Bayes factors

• how to search model space?– MIM: sequential forward selection/backward elimination

• scan loci systematically across genome– BIM: sample forward/backward: transdimensional MCMC

• sample loci at random across genome

Page 7: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 7

Bayes factors to assess models• Bayes factor: which model best supports the data?

– ratio of posterior odds to prior odds– ratio of model likelihoods

• equivalent to LR statistic when– comparing two nested models– simple hypotheses (e.g. 1 vs 2 QTL)

• related to Bayes Information Criteria (BIC)– Schwartz introduced for model selection in general settings– penalty to balance model size (p = number of parameters)

)log(2)log(2)log(2),1|pr(

),|pr()1pr(/)pr(

),|1(pr/),|(pr

nLRBFXmY

XmYmm

XYmXYmBF

−−=−+

=++=

Page 8: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 8

QTL Bayes factors & RJ-MCMC• easy to compute Bayes factors from samples

– posterior pr(m|Y,X) is marginal histogram– posterior affected by prior pr(m)

• BF insensitive to shape of prior– geometric, Poisson, uniform– precision improves when prior mimics posterior

• BF sensitivity to prior variance on effects θ– prior variance should reflect data variability– resolved by using hyper-priors

• automatic algorithm; no need for user tuning

e

e

e

ee

ee e e e e

0 2 4 6 8 10

0.00

0.10

0.20

0.30

p

p

p p

p

p

pp p p p

u u u u u u u

u u u u

m = number of QTL

prio

r pro

babi

lity

epu

exponentialPoissonuniform

Page 9: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 9

• Y = µ + GQ + environment• partition genotypic effect into separate QTL effects

GQ = main QTL effects + epistatic interactionsGQ = θ1Q + . . . + θmQ + θ12Q + . . .

• priors on mean and effectsGQ ~ N(0, h2s2) model independent genotypic priorθjQ ~ N(0, κ1s2/m.) effects and interactionsθj2Q ~ N(0, κ2s2/m.) down-weighted

• hyperparameters (to reduce sensitivity of Bayes factors to prior)– s2 = total sample variance– m.= m+m2 = number of QTL effects and interactions– h2 = κ1+κ2 = unknown heritability, h2/2~Beta(a,b)

multiple QTL phenotype model

Page 10: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 10

Markov chain Monte Carlo idea

0 2 4 6 8 10

0.0

0.1

0.2

0.3

0.4

θ

pr(θ

|Y)

have posterior pr(θ |Y)want to draw samples

propose θ ~ pr(θ |Y)(ideal: Gibbs sample)

propose new θ “nearby”accept if more probabletoss coin if less probablebased on relative heights(Metropolis-Hastings)

Page 11: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 11

MCMC realization

0 2 4 6 8 10

020

0040

0060

0080

0010

000

θ

mcm

c se

quen

ce

2 3 4 5 6 7 8

0.0

0.1

0.2

0.3

0.4

0.5

0.6

θ

pr(θ

|Y)

added twist: occasionally propose from whole domain

Page 12: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 12

a complicated simulation•simulated F2 intercross, 8 QTL

– (Stephens, Fisch 1998)– n=200, heritability = 50%– detected 3 QTL

•increase to detect all 8– n=500, heritability to 97%

QTL chr loci effect1 1 11 –32 1 50 –53 3 62 +24 6 107 –35 6 152 +36 8 32 –47 8 54 +18 9 195 +2

7 8 9 10 11 12 13

010

2030

40

number of QTL

frequ

ency

in %

0 50 100 150 200

ch1ch2ch3ch4ch5ch6ch7ch8ch9

ch10

Genetic map

posterior

Page 13: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 13

loci pattern across genome• notice which chromosomes have persistent loci• best pattern found 42% of the time

Chromosome m 1 2 3 4 5 6 7 8 9 10 Count of 80008 2 0 1 0 0 2 0 2 1 0 33719 3 0 1 0 0 2 0 2 1 0 7517 2 0 1 0 0 2 0 1 1 0 3779 2 0 1 0 0 2 0 2 1 0 2189 2 0 1 0 0 3 0 2 1 0 2189 2 0 1 0 0 2 0 2 2 0 198

Page 14: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 14

Bmapqtl: our RJ-MCMC software• www.stat.wisc.edu/~yandell/qtl/software/Bmapqtl

– module using QtlCart format– compiled in C for Windows/NT– extensions in progress– R post-processing graphics

• library(bim) is cross-compatible with library(qtl)

• Bayes factor and reversible jump MCMC computation• enhances MCMCQTL and revjump software

– initially designed by JM Satagopan (1996)– major revision and extension by PJ Gaffney (2001)

• whole genome• multivariate update of effects; long range position updates• substantial improvements in speed, efficiency• pre-burnin: initial prior number of QTL very large

Page 15: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 15

B. napus 8-week vernalizationwhole genome study

• 108 plants from double haploid– similar genetics to backcross: follow 1 gamete– parents are Major (biennial) and Stellar (annual)

• 300 markers across genome– 19 chromosomes– average 6cM between markers

• median 3.8cM, max 34cM– 83% markers genotyped

• phenotype is days to flowering– after 8 weeks of vernalization (cooling)– Stellar parent requires vernalization to flower

Page 16: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 16

Markov chain Monte Carlo sequence

burnin (sets up chain)mcmc sequence

number of QTLenvironmental varianceh2 = heritability(genetic/total variance)LOD = likelihood

-50000 -40000 -30000 -20000 -10000 0

05

1015

2025

burnin sequence

nqtl

-50000 -40000 -30000 -20000 -10000 00.

006

0.01

00.

014

burnin sequenceen

vvar

-50000 -40000 -30000 -20000 -10000 0

0.0

0.2

0.4

0.6

burnin sequence

herit

-50000 -40000 -30000 -20000 -10000 0

05

1015

20

burnin sequence

LOD

0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+0

24

68

10

mcmc sequence

nqtl

0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+0

0.00

40.

010

0.01

6

mcmc sequence

envv

ar

0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+0

0.1

0.3

0.5

mcmc sequence

herit

0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+05

1015

20mcmc sequence

LOD

Page 17: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 17

MCMC sampled loci

subset of chromosomesN2, N3, N16

points jittered for viewblue lines at markers

note concentrationon chromosome N2

020

4060

8010

012

0

N2 N3 N16chromosome

MC

MC

sam

pled

loci

ec2e5a

E33M49.117

ec3b12wg2d11b

wg1g8a

E32M49.73wg3g11ec3a8wg7f3aE33M59.59

E35M62.117wg6b10

wg8g1bwg5a5

tg6a12

Aca1E38M50.133

E35M59.117

ec2d1a

wg1a10tg2h10b

tg2f12

ec3e12b

wg5a1bwg2a3c

ec5e12awg7f5bE32M47.159

tg5d9bslg6E35M59.85

E33M49.491E38M62.229wg1g10bec4h7

wg4d10

E32M50.409

E38M50.159

E33M47.338ec2d8a

E33M49.165wg4f4cE35M48.148wg6c6ec4g7bwg7a11

wg5b1aec4e7aE32M61.218

wg4a4bE33M50.183E33M62.140

wg9c7

wg6b2

E33M49.211wg2e12bisoIdh

wg3f7c

Page 18: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 18

Bayesian model assessment

row 1: # QTLrow 2: pattern

col 1: posteriorcol 2: Bayes factornote error bars on bf

evidence suggests4-5 QTLN2(2-3),N3,N16

1 3 5 7 9 110.

00.

10.

20.

3number of QTL

QTL

pos

terio

r

QTL posterior

15

5050

0

number of QTL

post

erio

r / p

rior

Bayes factor ratios

1 3 5 7 9 11

weak

moderate

strong

1 3 5 7 9 11 13

0.0

0.1

0.2

0.3

2*2

2:2,

123:

2*2,

122:

2,13

2:2,

32:

2,16

2:2,

113:

2*2,

32:

2,15

4:2*

2,3,

163:

2*2,

132:

2,14

2

model index

mod

el p

oste

rior

pattern posterior

5 e

-01

1 e

+01

5 e

+02

model indexpo

ster

ior /

prio

r

Bayes factor ratios

1 3 5 7 9 11 13

12

23

2 22

23

24

32

weak

moderate

strong

Page 19: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 19

Bayesian model diagnostics pattern: N2(2),N3,N16col 1: densitycol 2: boxplots by m

environmental varianceσ2 = .008, σ = .09

heritabilityh2 = 52%

LOD = 16(highly significant)

but note change with m

0.004 0.006 0.008 0.010 0.012

050

100

200

dens

ity

marginal envvar, m ≥ 44 5 6 7 8 9 11 12

0.00

60.

008

0.01

0

envvar conditional on number of QTL

envv

ar

0.2 0.3 0.4 0.5 0.6 0.7

01

23

45

dens

ity

marginal heritability, m ≥ 44 5 6 7 8 9 11 12

0.30

0.40

0.50

0.60

heritability conditional on number of QT

herit

abilit

y5 10 15 20 25

0.00

0.04

0.08

0.12

dens

ity

marginal LOD, m ≥ 44 5 6 7 8 9 11 12

1012

1416

1820

LOD conditional on number of QTLLO

D

Page 20: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 20

Bayesian estimates of loci & effects

histogram of lociblue line is densityred lines at estimates

estimate additive effects(red circles)

grey points sampledfrom posterior

blue line is cubic splinedashed line for 2 SD

0 50 100 150 200 2500.

000.

020.

040.

06lo

ci h

isto

gram

napus8 summaries with pattern 1,1,2,3 and m ≥ 4

N2 N3 N16

0 50 100 150 200 250

-0.0

8-0

.02

0.02

addi

tive

N2 N3 N16

Page 21: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 21

loci marginal posteriors

unlinked loci linked loci

Page 22: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 22

mapping gene expression

• 108 F2 mice• mRNA to RT-PCR• multivariate screen

• clustering• PC analysis

• highlight SCD• Lan et al. (2003)

• ch2 dominance

Page 23: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 23

false detection rates and posteriors

• multiple comparisons: test QTL across genome– size = Pr( LOD(λ) > t | no QTL at λ )– genome-wise threshold

• theoretical value or permutation value (Churchill Doerge 1995)

– threshold guards against a single false detection– difficult to extend to multiple QTL

• positive false discovery rate (Storey 2001)– pFDR = Pr( no QTL at λ | LOD(λ) > t )– consider proportion of false detections for threshold– related to Bayesian posterior– extends naturally to multiple QTL

Page 24: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 24

pFDR and QTL posterior• single QTL case

– pick a rejection region R = {λ|LOD(λ)>t} for some t– pFDR = Pr(m=0)*size/[Pr(m=0)*size+Pr(m=1)*power]– power = Pr(λ in R | Y, X, m = 1 )– size = (length of R) / (length of genome)

• multiple QTL case– pFDR = Pr(m=0)*size/[Pr(m=0)*size+Pr(m>1)*power]– power = Pr(λ in R | Y, X, m > 1 )

• extends to other null hypotheses– pFDR = Pr(m=1)*size/[Pr(m=1)*size+Pr(m>2)*power]

Page 25: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 25

B napus with m~Poisson(1)

Page 26: Taking the Broad View of Model Selection for QTL in ...pages.cs.wisc.edu/~yandell/talk/pag/pag2003.pdf · interval mapping basics • observed measurements – Y = phenotypic trait

January 2003 PAG XI © Brian S. Yandell 26

Summary• Bayesian posteriors and Bayes factors

– Bayes factors for model assessment– posteriors can reveal subtle hints of QTL

• graphical tools for model selection– Bayes factor ratios on log scale– model identified by m or genetic architecture

• connection to false discovery rate– whole genome evaluation– calibrate posterior region with pFDR