introduction to qtl mapping in model organismskbroman/teaching/... · interval mapping (im) lander...
TRANSCRIPT
![Page 1: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/1.jpg)
Introduction to QTL mapping
in model organisms
Karl W Broman
Department of BiostatisticsJohns Hopkins University
www.biostat.jhsph.edu/˜kbroman
Outline
•Experiments and data
•Models
•ANOVA at marker loci
• Interval mapping
•Epistasis
• LOD thresholds
•CIs for QTL location
•Selection bias
• The X chromosome
•Selective genotyping
•Covariates
•Non-normal traits
• The need for good data
![Page 2: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/2.jpg)
Backcross experiment
P1
A A
P2
B B
F1
A B
BCBC BC BC
Intercross experiment
P1
A A
P2
B B
F1
A B
F2F2 F2 F2
![Page 3: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/3.jpg)
Phenotype distributions
• Within each of the parentaland F1 strains, individuals aregenetically identical.
• Environmental variation mayor may not be constant withgenotype.
• The backcross generation ex-hibits genetic as well as envi-ronmental variation.
A B
Parental strains
0 20 40 60 80 100
0 20 40 60 80 100
F1 generation
0 20 40 60 80 100
Phenotype
Backcross generation
Data
Phenotypes: yi = trait value for individual i
Genotypes: xij = 0/1 if mouse i is BB/AB at marker j
(or 0/1/2, in an intercross)
Genetic map: Locations of markers
![Page 4: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/4.jpg)
80
60
40
20
0
Chromosome
Loca
tion
(cM
)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X
Genetic map
20 40 60 80 100 120
20
40
60
80
100
120
Markers
Indi
vidu
als
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X
Genotype data
![Page 5: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/5.jpg)
Goals
•Detect QTLs (and interactions between QTLs)
•Confidence intervals for QTL location
•Estimate QTL effects (effects of allelic substitution)
Statistical structure
QTL
Markers Phenotype
Covariates
The missing data problem:
Markers ←→ QTL
The model selection problem:
QTL, covariates −→ phenotype
![Page 6: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/6.jpg)
Models: Recombination
We assume •Mendel’s rules
• no crossover interference
1. Pr(xij = 0) = Pr(xij = 1) = 1/2
2. Pr(xi,j+1 = 1 | xij = 0) = Pr(xi,j+1 = 0 | xij = 1) = rj
where ri is the recombination fraction for the interval
3. The genotype at a position depends only on thegenotypes at the nearest flanking typed markers
Example
A B − B A A
B A A − B A
A A A A A B
?
![Page 7: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/7.jpg)
Models: Genotype ←→ Phenotype
Let y = phenotypeg = whole genome genotype
Imagine a small number of QTLs with genotypes g1, . . . , gp.(2p distinct genotypes)
E(y|g) = µg1,...,gpvar(y|g) = σ2
g1,...,gp
Models: Genotype ←→ Phenotype
Homoscedasticity (constant variance): σ2g ≡ σ2
Normally distributed residual variation: y|g ∼ N (µg, σ2).
Additivity: µg1,...,gp= µ +
∑pj=1 ∆j gj (gj = 1 or 0)
Epistasis: Any deviations from additivity.
![Page 8: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/8.jpg)
The simplest method: ANOVA
• Also known as markerregression.
• Split mice into groupsaccording to genotypeat a marker.
• Do a t-test / ANOVA.
• Repeat for each marker.40
50
60
70
80
Phe
noty
pe
BB AB
Genotype at D1M30
BB AB
Genotype at D2M99
ANOVA at marker loci
Advantages• Simple.
• Easily incorporatescovariates.
• Easily extended to morecomplex models.
• Doesn’t require a geneticmap.
Disadvantages•Must exclude individuals
with missing genotype data.
• Imperfect information aboutQTL location.
• Suffers in low density scans.
• Only considers one QTL at atime.
![Page 9: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/9.jpg)
Interval mapping (IM)
Lander & Botstein (1989)
• Take account of missing genotype data
• Interpolate between markers
•Maximum likelihood under a mixture model
0 20 40 60
0
2
4
6
8
Map position (cM)
LOD
sco
re
Interval mapping (IM)
Lander & Botstein (1989)
• Assume a single QTL model.
• Each position in the genome, one at a time, is posited as theputative QTL.
• Let z = 1/0 if the (unobserved) QTL genotype is BB/AB.Assume y ∼ N (µz, σ)
• Given genotypes at linked markers, y ∼mixture of normal dist’nswith mixing proportion Pr(z = 1|marker data):
QTL genotypeM1 M2 BB ABBB BB (1− rL)(1− rR)/(1− r) rLrR/(1− r)BB AB (1− rL)rR/r rL(1− rR)/rAB BB rL(1− rR)/r (1− rL)rR/rAB AB rLrR/(1− r) (1− rL)(1− rR)/(1− r)
![Page 10: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/10.jpg)
The normal mixtures
M1 M2Q
7 cM 13 cM
• Two markers separated by 20 cM,with the QTL closer to the leftmarker.
• The figure at right show the dis-tributions of the phenotype condi-tional on the genotypes at the twomarkers.
• The dashed curves correspond tothe components of the mixtures.
20 40 60 80 100
Phenotype
BB/BB
BB/AB
AB/BB
AB/AB
µ0
µ0
µ0
µ0
µ1
µ1
µ1
µ1
Interval mapping (continued)
Let pi = Pr(zi = 1|marker data)
yi|zi ∼ N (µzi, σ2)
Pr(yi|marker data, µ0, µ1, σ) = pif (yi; µ1, σ) + (1− pi)f (yi; µ0, σ)
where f (y; µ, σ) = density of normal distribution
Log likelihood: l(µ0, µ1, σ) =∑
i log Pr(yi|marker data, µ0, µ1, σ)
Maximum likelihood estimates (MLEs) of µ0, µ1, σ:
values for which l(µ0, µ1, σ) is maximized.
![Page 11: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/11.jpg)
LOD scores
The LOD score is a measure of the strength of evidence for thepresence of a QTL at a particular location.
LOD(z) = log10 likelihood ratio comparing the hypothesis of aQTL at position z versus that of no QTL
= log10
{
Pr(y|QTL at z,µ̂0z,µ̂1z,σ̂z)Pr(y|no QTL,µ̂,σ̂)
}
µ̂0z, µ̂1z, σ̂z are the MLEs, assuming a single QTL at position z.
No QTL model: The phenotypes are independent and identicallydistributed (iid) N (µ, σ2).
An example LOD curve
0 20 40 60 80 100
0
2
4
6
8
10
12
Chromosome position (cM)
LOD
![Page 12: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/12.jpg)
0 500 1000 1500 2000 2500
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Map position (cM)
lod
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19LOD curves
Interval mapping
Advantages• Takes proper account of
missing data.
• Allows examination ofpositions between markers.
• Gives improved estimatesof QTL effects.
• Provides pretty graphs.
Disadvantages• Increased computation
time.
• Requires specializedsoftware.
• Difficult to generalize.
• Only considers one QTL ata time.
![Page 13: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/13.jpg)
Multiple QTL methods
Why consider multiple QTLs at once?
•Reduce residual variation.
•Separate linked QTLs.
• Investigate interactions between QTLs (epistasis).
Epistasis in a backcross
Additive QTLs
Interacting QTLs
0
20
40
60
80
100
Ave
. phe
noty
pe
A HQTL 1
A
H
QTL 2
0
20
40
60
80
100
Ave
. phe
noty
pe
A HQTL 1
A
H
QTL 2
![Page 14: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/14.jpg)
Epistasis in an intercross
Additive QTLs
Interacting QTLs
0
20
40
60
80
100
Ave
. phe
noty
pe
A H BQTL 1
AH
BQTL 2
0
20
40
60
80
100
Ave
. phe
noty
pe
A H BQTL 1
A
H
BQTL 2
LOD thresholds
Large LOD scores indicate evidence for the presence of a QTL.
Q: How large is large?
→ We consider the distribution of the LOD score under the nullhypothesis of no QTL.
Key point: We must make some adjustment for our examination ofmultiple putative QTL locations.
→We seek the distribution of the maximum LOD score, genome-wide. The 95th %ile of this distribution serves as a genome-wideLOD threshold.
Estimating the threshold: simulations, analytical calculations, per-mutation (randomization) tests.
![Page 15: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/15.jpg)
Null distribution of the LOD score
• Null distribution derived bycomputer simulation of backcrosswith genome of typical size.
• Solid curve: distribution of LODscore at any one point.
• Dashed curve: distribution ofmaximum LOD score,genome-wide.
0 1 2 3 4
LOD score
Permutation tests
mice
markers
genotypedata
phenotypes
-
LOD(z)
(a set of curves)-
M =
maxz LOD(z)
• Permute/shuffle the phenotypes; keep the genotype data intact.
• Calculate LOD?(z) −→ M ? = maxz LOD?(z)
• We wish to compare the observed M to the distribution of M ?.
• Pr(M ? ≥M) is a genome-wide P-value.
• The 95th %ile of M ? is a genome-wide LOD threshold.
• We can’t look at all n! possible permutations, but a random set of 1000 is feasi-ble and provides reasonable estimates of P-values and thresholds.
• Value: conditions on observed phenotypes, marker density, and pattern of miss-ing data; doesn’t rely on normality assumptions or asymptotics.
![Page 16: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/16.jpg)
Permutation distribution
maximum LOD score
0 1 2 3 4 5 6 7
95th percentile
1.5-LOD support interval
0 5 10 15 20 25 30 35
0
1
2
3
4
5
6
Map position (cM)
lod
1.5
1.5−LOD support interval
![Page 17: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/17.jpg)
Selection bias
• The estimated effect of a QTL willvary somewhat from its true effect.
• Only when the estimated effect islarge will the QTL be detected.
• Among those experiments in whichthe QTL is detected, the estimatedQTL effect will be, on average,larger than its true effect.
• This is selection bias.
• Selection bias is largest in QTLswith small or moderate effects.
• The true effects of QTLs that weidentify are likely smaller than wasobserved.
Estimated QTL effect
0 5 10 15
QTL effect = 5Bias = 79%
Estimated QTL effect
0 5 10 15
QTL effect = 8Bias = 18%
Estimated QTL effect
0 5 10 15
QTL effect = 11Bias = 1%
Implications of selection bias
•Estimated % variance explained by identified QTLs
•Repeating an experiment
•Congenics
•Marker-assisted selection
![Page 18: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/18.jpg)
The X chromosome
In a backcross, the X chromosome may or may not besegregating.
(A × B) × A
Females: XA·B XA
Males: XA·B YA
A × (A × B)
Females: XA XA
Males: XA YB
The X chromosome
In an intercross, one must pay attention to the paternalgrandmother’s genotype.
(A × B) × (A × B) or (B × A) × (A × B)
Females: XA·B XA
Males: XA·B YB
(A × B) × (B × A) or (B × A) × (B × A)
Females: XA·B XB
Males: XA·B YA
![Page 19: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/19.jpg)
Selective genotyping
• Save effort by onlytyping the mostinformative individuals(say, top & bottom 10%).
• Useful in context of asingle, inexpensive trait.
• Tricky to estimate theeffects of QTLs: use IMwith all phenotypes.
• Can’t get at interactions.
• Likely better to alsogenotype some randomportion of the rest of theindividuals.
40
50
60
70
80
Phe
noty
pe
BB AB
All genotypes
BB AB
Top and botton 10%
Covariates
• Examples: treatment, sex,litter, lab, age.
• Control residual variation.
• Avoid confounding.
• Look for QTL × environ’tinteractions
• Adjust before intervalmapping (IM) versus adjustwithin IM.
0 20 40 60 80 100 120
0
1
2
3
4
Map position (cM)
lod
7
Intercross, split by sex
0 20 40 60 80 100 120
0
1
2
3
4
Map position (cM)
lod
7
Reverse intercross
![Page 20: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/20.jpg)
Non-normal traits
• Standard interval mapping assumes normally distributedresidual variation. (Thus the phenotype distribution is amixture of normals.)
• In reality: we see dichotomous traits, counts, skeweddistributions, outliers, and all sorts of odd things.
• Interval mapping, with LOD thresholds derived frompermutation tests, generally performs just fine anyway.
• Alternatives to consider:– Nonparametric approaches (Kruglyak & Lander 1995)
– Transformations (e.g., log, square root)
– Specially-tailored models (e.g., a generalized linear model, the Coxproportional hazard model, and the model in Broman et al. 2000)
Check data integrity
The success of QTL mapping depends crucially on the integrity ofthe data.
• Segregation distortion
• Genetic maps / marker positions
• Genotyping errors (tight double crossovers)
• Phenotype distribution / outliers
• Residual analysis
![Page 21: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/21.jpg)
Summary I
• ANOVA at marker loci (aka marker regression) is simple and easily extendedto include covariates or accommodate complex models.
• Interval mapping improves on ANOVA by allowing inference of QTLs topositions between markers and taking proper account of missing genotypedata.
• ANOVA and IM consider only single-QTL models. Multiple QTL methods allowthe better separation of linked QTLs and are necessary for the investigation ofepistasis.
• Statistical significance of LOD peaks requires consideration of the maximumLOD score, genome-wide, under the null hypothesis of no QTLs. Permutationtests are extremely useful for this.
• 1.5-LOD support intervals indicate the plausible location of a QTL.
• Estimates of QTL effects are subject to selection bias. Such estimated effectsare often too large.
Summary II
• The X chromosome must be dealt with specially, and can be tricky.
• Study your data. Look for errors in the genetic map, genotyping errors andphenotype outliers. But don’t worry about them too much.
• Selective genotyping can save you time and money, but proceed with caution.
• Study your data. The consideration of covariates may reveal extremelyinteresting phenomena.
• Interval mapping works reasonably well even with non-normal traits. Butconsider transformations or specially-tailored models. If interval mappingsoftware is not available for your preferred model, start with some version ofANOVA.
![Page 22: Introduction to QTL mapping in model organismskbroman/teaching/... · Interval mapping (IM) Lander & Botstein (1989) •Assume a single QTL model. •Each position in the genome,](https://reader031.vdocument.in/reader031/viewer/2022040215/5f0243fc7e708231d403677b/html5/thumbnails/22.jpg)
References
• Broman KW (2001) Review of statistical methods for QTL mapping in experi-mental crosses. Lab Animal 30(7):44–52
A review for non-statisticians.
• Doerge RW (2002) Mapping and analysis of quantitative trait loci in experimen-tal populations. Nat Rev Genet 3:43–52
A very recent review.
• Doerge RW, Zeng Z-B, Weir BS (1997) Statistical issues in the search forgenes affecting quantitative traits in experimental populations. Statistical Sci-ence 12:195–219
Review paper.
• Jansen RC (2001) Quantitative trait loci in inbred lines. In Balding DJ et al.,Handbook of statistical genetics, John Wiley & Sons, New York, chapter 21
Review in an expensive but rather comprehensive and likely useful book.
• Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits. SinauerAssociates, Sunderland, MA, chapter 15
Chapter on QTL mapping.
• Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantita-tive traits using RFLP linkage maps. Genetics 121:185–199
The seminal paper.
• Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative traitmapping. Genetics 138:963–971
LOD thresholds by permutation tests.
• Kruglyak L, Lander ES (1995) A nonparametric approach for mapping quanti-tative trait loci. Genetics 139:1421–1428
Non-parameteric interval mapping.
• Boyartchuk VL, Broman KW, Mosher RE, D’Orazio S, Starnbach M, DietrichWF (2001) Multigenic control of Listeria monocytogenes susceptibility in mice.Nat Genet 27:259–260
• Broman KW (2003) Mapping quantitative trait loci in the case of a spike in thephenotype distribution. Genetics 163:1169-1175
QTL mapping with a special model for a non-normal phenotype.