many loci effect on trait small combine together to affect phenotype environmental sensitivity...
TRANSCRIPT
Many loci
Effect on trait small
Combine together to affect phenotype
Environmental sensitivity
Introduction to QTL Mapping
Genetic Architecture of Quantitative Traits
Loci?
Distribution of effects on trait?
Distribution of pleiotropic effects (including fitness)
Distribution of context-dependent effects?Sex
EnvironmentGenetic background (epistasis)
Allele frequency?
Causal molecular variant?
QTL Mapping
QTL Mapping
QTL effects too small to be detected by Mendelian segregation
Need to map QTLs by linkage to marker loci with genotypes than can be unambiguously scored
Principle dates back to 1923, but abundant, polymorphic molecular markers only relatively recently available
Most studies use single nucleotide polymorphism (SNP) markers and insertion/deletion (indel) markers
Massively parallel sequencing technology is revolutionizing our ability to rapidly map QTLs
QTL Mapping
Genotype
Individual M1 M2 M3 M4 M5 M6 M7 M8 Phenotype
1 1 1 2 2 2 3 3 1 20
2 3 3 1 1 1 1 1 3 8
3 1 1 2 2 1 1 2 1 14
4 3 2 3 2 2 2 3 2 24
5 1 3 2 2 2 2 3 3 28
6 3 3 2 3 1 3 2 1 18
7 3 2 3 1 3 1 3 3 10
8 2 3 1 3 2 2 1 2 22
9 2 1 3 3 3 3 1 3 20
10 1 2 1 3 3 3 2 2 17
QTL Mapping Data
QTL Mapping: A Primer
Linkage Mapping Association Mapping
Two (or more) parental strains that differ genetically for trait
Population sample of individuals with genetic variation for the trait
Molecular markers that distinguish the parental strains
Molecular markers (whole genome or candidate gene)
Mapping population:Genotype all individuals for markers
Measure trait phenotype
Map QTLs by linkage to markers:Single marker analysis
Interval mapping
Map QTLs by Linkage Disequilibrium (LD) with markers
Mapping population:Genotype all individuals for markers
Measure trait phenotype
Map QTLs in pedigrees or populations derived from crosses of inbred lines
Map QTLs in individuals from an outbred population
Linkage Mapping: Find Parental Strains
0
10
20
30
40
50
60
70
80
90
100
Sta
rvati
on
Resis
tan
ce (
ho
urs
)
0
5
10
15
20
25
30
35
40
45
Lo
com
oto
r R
eact
ivit
y (s
eco
nd
s)
0
5
10
15
20
25
30
35
Ch
ill C
om
a R
eco
very
(m
inu
tes)
0
10
20
30
40
50
60
70
80
90
100
Lif
esp
an (
ho
urs
)
H2 = 0.56
H2 = 0.23
H2 = 0.58
H2 = 0.54
Linkage Mapping: Create Mapping Population
P1 P2
F1
BC1: F1 P1 BC2: F1 P2F2: F1 F1
RILs
M1 - - - A1 - - N1- - - - - - - O1 M2 - - - A2 - - N2- - - - - - - O2M1 - - - A1 - - N1 - - - - - - -O1 vs. M2 - - - A2 - - N2- - - - - - - O2
Test for: Linkage of a QTL (A) to individual markers (M, N, or O) = single
marker analysis
QTL in each interval in turn (M-N and N-O) = interval mapping
If there is a difference in trait mean between marker genotype classes, then a QTL is linked to the marker
Infer chromosomal locations and effects (a, d) of QTLs
Linkage Mapping: Test for Associations Between Markers and Trait
M Marker locusA QTLc recombination fraction between M and A
Line Cross Analysis: Single Markers
M A
c
Line Cross Analysis: Single Markers
M A
cGeneration Genotype ValueP1 M1 A1 / M1 A1 aP2 M2 A2 / M2 A2 –aF1 M1 A1 / M2 A2 d
F1 gametes:
Genotype FrequencyM1 A1 (1 – c)/2M2 A2 (1 – c)/2M1 A2 c/2M2 A1 c/2
Non-recombinant genotypes
Recombinant genotypes
Random mating of the F1 gives 10 possible F2 genotypicclasses.
The contribution of each marker genotype class to the F2 mean is obtained by multiplying the frequency of each genotype by its genotypic value, then summing within marker genotype classes.
We want actual means, which are got by dividing thecontribution to the F2 mean by the frequency of that marker class, which is the Mendelian segregation ratio of ¼ for the homozygotesand ½ for the heterozygotes.
Line Cross Analysis: Single Markers, F2 Mapping Population
Genotype Freq. Value Marker Total Contribution ActualClass Freq. to F2 Mean Mean
M1A1/M1A1 (1 – c)2/4 aM1A1/M1A2 c(1 – c)/2 d M1/M1 ¼ a(1 – 2c)/4 a(1 – 2c) M1A2/M1A2 c 2/4 –a + dc(1 – c)/2 + 2dc(1 – c)
M1A1/M2A1 c(1 – c)/2 aM1A1/M 2A2 (1 – c)2/2 d
M1/M2 ½ d[(1 – c)2 + c2]/2 d[(1 – c)2 +c2]M1A2/M2A1 c2/2 dM1A2/M2A2 c(1 – c)/2 –a
M2A1/M2A1 c2/4 aM2A1/M2A2 c(1 – c)/2 d M2/M2 ¼ – a(1 – 2c)/4 – a(1 – 2c)M2A2/M2A2 (1 – c)2/4 –a + dc(1 – c)/2 +2dc(1 – c)
F2 Genotypes With One Marker Locus, M, and a Linked QTL, A
The following two contrasts of marker class means are functions of a and d:
Contrast 1:
(M1/M1 – M2/M2)/2 = a(1 –2c)
Contrast 2:
M1/M2 – [(M1/M1 + M2/M2)/2] = d(1 –2c)2
This contrast, in combination with the first, therefore allows estimation of d/a, but will always be underestimated by (1 –2c)
F2 Genotypes With One Marker Locus, M, and a Linked QTL, A
F2 Genotypes With One Marker Locus, M, and a Linked QTL, A
In summary:
A significant difference in the mean value of a quantitative trait between homozygous marker genotype classes indicates linkage of a QTL and the marker locus.
Estimates of a and d/a from single marker analysis are confounded with recombination frequency, and will generally underestimate the true values by (1 –2c).
Example: The true effect is a = 1, d = 0. Expected estimates for a as a function of c:
c a0 10.1 0.80.25 0.50.5 0
With complete cross-over interference:c = c1 + c2
(True for c < 0.1 = 10 cM)
Interval Mapping Analysis
M N
c
A
c1 c2
Generation Genotype ValueP1 M1A1N1/M1A1N1 aP2 M2A2N2/M2A2N2 –a
F1 M1A1N1/M2A2N2 d
F1 gametes:Genotype FrequencyM1A1N1 (1–c)/2M2A2 N2 (1–c)/2M1A2N2 c1/2M2 A1N1 c1/2M1 A1N2 c2/2M2 A2N1 c2/2
Line Cross Analysis: Interval Mapping
Non-recombinant genotypes
Recombinant genotypes
Example: Back-cross (BC) mapping population:Tabulate BC genotypes, frequencies and means,
assuming no double recombination. Calculate expected marker genotype means.
F1 backcrossed to M1A1N1.
Gamete Freq. Value Marker Freq. Contribution to Actual Type Class BC Mean Mean
M1A1N1 (1–c)/2 a M1N1/M1N1 (1–c)/2 a(1–c)/2 a
M1A1N2 c2/2 a M1N1/M1N2 c/2 (ac2+dc1)/2 (ac2 + dc1)/c
M1A2N2 c1/2 d
M2A1N1 c1/2 aM1N1/M2N1 c/2 (ac1+dc2)/2 (ac1 + dc2)/c
M2A2N1 c2/2 d
M2A2N2 (1–c)/2 d M1N1/M2N2 (1–c)/2 d(1–c)/2 d
BC Genotypes With Two Linked Markers, M and N, and a linked QTL, A
In a manner similar to the single marker example, contrasts between backcross marker class means (γ and δ below) estimate the effects of the QTL.
In contrast to the single marker example, the map position relative to the flanking markers can also be estimated:
M1N1/M1N1 – M1N1/M2N2 = a – d = γ
M1N1/M1N2 – M1N1/M2N1 = (a – d)(c2 – c1)/c = δ
BC Genotypes With Two Linked Markers, M and N, and a linked QTL, A
The estimate of a is unbiased only if d = 0, so recessive QTLs may not be detected.
This problem can be overcome by backcrossing to both parental lines, or by using an F2 design.
Note: c is assumed to be known, so c1 and c2 can be estimated:
δ/γ = (c2 – c1)/c = (c – 2c1)/c
and solve for c1.
BC Genotypes With Two Linked Markers, M and N, and a linked QTL, A
Association Mapping: Collect Population Phenotypes and Genotypes
0
10
20
30
40
50
60
70
80
90
100
Sta
rvati
on
Resis
tan
ce (
ho
urs
)
0
5
10
15
20
25
30
35
40
45
Lo
com
oto
r R
eact
ivit
y (s
eco
nd
s)
0
5
10
15
20
25
30
35
Ch
ill C
om
a R
eco
very
(m
inu
tes)
0
10
20
30
40
50
60
70
80
90
100
Lif
esp
an (
ho
urs
)
H2 = 0.56
H2 = 0.23
H2 = 0.58
H2 = 0.54
Association Mapping
Association mapping utilizes historical recombination in random mating populations to identify QTLs, measured by linkage disequilibrium (LD)
LD is a measure of the correlation in gene frequencies between two loci.
Consider locus A with alleles A1 and A2 at frequencies p1 and p2 respectively, and locus B with alleles B1 and B2 at frequencies q1 and q2 respectively.
If the gene frequencies at these loci are uncorrelated, the expected frequency of each gamete type is the product of the allele frequencies at each locus separately.
The gamete types are called HAPLOTYPES because we describe the genetic constitution of a haploid gamete.
For two loci there are only 4 gamete types: A1B1, A1B2, A2B1 and A2B2.
Linkage Disequilibrium (LD)
Gamete Type Expected Observed (Haplotype) Frequency Frequency
A1B1 p1q1 = P11
A1B2 p1q2 = P12
A2B1 p2q1 = P21
A2B2 p2q2 = P22
Where p1 + p2 = 1 q1 + q2 = 1
Linkage Disequilibrium (LD)
If allele frequencies are uncorrelated, the population is in ‘linkage equilibrium’, and P11P22 - P12P21 = 0
If allele frequencies are non-randomly associated, the gamete frequencies are not the simple product of the allele frequencies, but depart from this by amount D
D is the coefficient of linkage disequilibrium
Linkage Disequilibrium (LD)
Gamete Types Expected Frequency Observed(Haplotypes) (Disequilibrium) Frequency
A1B1 p1q1 + D = P11
A1B2 p1q2 – D = P12
A2B1 p2q1 – D = P21
A2B2 p2q2 + D = P22
and P11P22 – P12P21 = D
A1B1
A2B2A2B2
A2B2A1B1
A1B1
A2B2
A2B2
A1B1 A2B2
A1B1
A1B1
A2B2A1B1A1B1
A2B2
Linkage Disequilibrium
A1B1
A2B2
A1B1
A1B1
A2B2
A2B2
A2B2
A1B2
A1B1
A1B2
A1B2A1B2
A2B1
A2B1
A2B1
A2B1
Linkage Equilibrium
Numerical value of D depends on gene frequencies at the two loci.
Sign of D is arbitrary for molecular markers; consider absolute value.
Highest value of D for p1 = p2 = q1 = q2 = 0.5, and gamete types A1B2 and A2B1 are missing (complete linkage disequilibrium); D is then 0.25.
Because of the dependence on gene frequency, values of D are typically scaled by the observed gene frequencies.
Linkage Disequilibrium (LD)
1. D'= D/Dmax
Dmax is the smaller of p1q2 or p2q1. This is because: P12 = p1q2 – D ≥ 0; D ≤ p1q2
P21 = p2q1 – D ≥ 0; D ≤ p2q1
Maximum values of D' is 1.
2. r2 = D2/p1p2q1q2
Expected value in equilibrium population is r2 = E(r2) = 1/(1 + 4Nc), where N is the effective population size and c is the recombination fraction between the two loci.
In principle one can use this relationship to estimate c, but r2 has very large statistical and genetic sampling variances, so in practice this relationship is not very useful.
Causes of LD:
Mutation (a new mutant allele is initially in complete linkage disequilibrium with all other loci in the genome)
Admixture between populations with different gene frequencies
Natural selection for particular combinations of alleles
Population bottlenecks (chance sampling of small number of haplotypes)
Linkage Disequilibrium (LD)
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30
Dt/D
0
Generation, t
c = 0.001c = 0.005c = 0.01
c = 0.05
c = 0.1c = 0.5
D declines in successive generations in a random mating population by an amount which depends on the recombination fraction, c.
Dt = D0 (1 – c)t after t generations of random mating.
With unlinked loci and free recombination (c = 0.5) D is halved by each generation of random mating; with linked loci D decays more slowly.
Linkage Disequilibrium (LD)
Linkage Disequilibrium (LD)
Then
Now
Disequilibrium between pairs of loci in random mating populations depends on population history, but is
expected to be small unless the loci are very tightly linked.
Use molecular polymorphism and phenotypic information from samples of alleles from a random mating population to determine whether there is an association with the trait phenotype.
Can be done for candidate gene, QTL region, or whole genome.
Depending on the scale of LD, one can use LD for fine-mapping QTL, and even causal variants.
LD large in populations that have undergone recent bottlenecks in population size, from a founder event or artificial selection
LD small in large, near equilibrium outbred populations (e.g., Drosophila).
CAVEAT: Population admixture can cause false positive associations if marker frequencies and trait values are different between populations
Association Mapping
Association MappingF
req
uen
cy
Phenotype
Quantitative traits: Group data by genotype for each
marker Assess if there is a difference
between the mean of the trait between different alleles of a marker genotype
If so, the locus affecting the trait is in LD with the marker locus
Cases Controls
Categorical traits: Group data according to whether
individuals are affected or not affected
Determine if there is a difference in genotype frequencies or allele frequencies between cases and controls
If so, the locus affecting the trait is in LD with the marker locus
Association Mapping
Association mapping underestimates QTL effects unless the molecular marker genotyped is the casual variant
Let be the effect attributable to the causal variant, and a the estimated effect.
= [p(1 – p)/D]a, where p is the frequency of the polymorphic site and D is the LD between the causal QTN and the poylmorphic site associated with it.
D p(1 – p) (maximum p(1 – p) = 0.25), so a
t-tests, ANOVA, marker regressions or more sophisticated maximum likelihood (ML) methods can be used to assess differences in trait phenotype between marker genotypes.
The parental lines will differ at many loci affecting the trait of interest; therefore QTLs unlinked to the markers under consideration will segregate in the F2 or backcross generation.
Methods for dealing with multiple QTL simultaneously (e.g., composite interval mapping) reduce the variance within marker genotype classes and improve estimates of map positions and of effects.
Linkage Mapping: Statistical Considerations
Many markers are tested for linkage to a QTL in a genome scan. The number of false positives increases with the number of tests. With n independent tests, the level for each should be set to α/n (a
Bonferroni correction). The number of independent tests will be less than the number of
markers because of linked markers. Permutation tests are typically used to determine appropriate
experiment-wise significance levels, accounting for multiple tests and correlated markers.
Linkage Mapping: Statistical Considerations
Chromosome I
0 40 80 120
Lik
elih
oo
d R
ati
o
0
5
10
15
20
25
30
35
40
45
50
Chromosome II
Testing Position (cM)
0 40 80 120 160 200 0 40
Chromosome III
0 40 80 120 160 200 240 280 320 360
Lik
eli
ho
od
rat
io
Genotype Phenotype
Ind. M1 M2 M3 M4 M5 M6 M7 M8 O P1 P2 P3 P4
1 1 1 2 2 2 3 3 1 20 14 24 8 20
2 3 3 1 1 1 1 1 3 8 20 22 17 28
3 1 1 2 2 1 1 2 1 14 18 20 28 10
4 3 2 3 2 2 2 3 2 24 8 18 20 8
5 1 3 2 2 2 2 3 3 28 22 10 18 17
6 3 3 2 3 1 3 2 1 18 24 20 14 22
7 3 2 3 1 3 1 3 3 10 17 14 24 20
8 2 3 1 3 2 2 1 2 22 28 17 20 18
9 2 1 3 3 3 3 1 3 20 10 8 22 24
10 1 2 1 3 3 3 2 2 17 20 28 10 14
Permutation Test
–logP
Permutation Test
How large must the experiment be to detect a difference δ between the two homozygous marker genotypes?
For simplicity, assume the QTL is completely linked to the marker ( c = 0) and that a t-test is used to judge the significance of the difference of two marker class means.
n ≥ 2 (zα + z2β)2/(δ/σP)2
σP phenotypic standard deviation within marker-classes α false positive (Type I) error rate (0.05)β false negative (Type II) error rate (0.1)z ordinate of the normal distribution corresponding to its subscript
zα = 1.96 and z2β = 1.28
Linkage Mapping: Power and Sample Size
Linkage Mapping: Power and Sample Size
n = number per marker classN = number of total mapping populationFor strictly additive effects, FA
2 = 2pq*2FP2
Easy to detect QTLs with large effects Need large sample sizes to detect QTLs with moderate to small
effectsThe power to detect a difference in mean between two marker
genotypes depends on δ/σP; strategies to reduce σP can increase power (e.g., progeny testing, RI lines).
F2
BC
Linkage Mapping: Recombination and Sample Size
Number of individuals needed to detect at least one recombinant in an interval of size c (c = 100cM)
Number of marker genotypes needed to localize QTLs per 100 cM
Linkage Mapping: Power, Recombination and Sample Size
Large numbers necessary to detect QTL AND estimate location.
For an F2 design, need 336 individuals to detect QTL with large effect (δ/σP = 0.5) x 59 individuals to ensure the QTL is mapped to a 5 cM region = 19,824 individuals in total and 416,304 marker genotypes per 100 cM.
QTL mapping is in practice an iterative procedure, where QTLs are first mapped to broad genomic regions in a genome scan, followed by high resolution mapping to localize genes within each QTL region.
Genotyping by sequencing is changing this strategy, facilitating rapid, fine mapping of QTLs.
q = 0.1
q = 0.25
q = 0.5
Association Mapping: Power and Sample Size
q = frequency of rare allele LD mapping has the same power as linkage mapping in an F2
population for intermediate gene frequencies, but much reduced power as the frequency of the rare allele decreases (the number of homozygotes in the population is q2)
This calculation assumes the marker is the causal variant; even larger samples are necessary if the marker is in LD with the causal variant
Easy to detect intermediate frequency variants with large effects Hard to detect rare variants with small effects
Association Mapping: Recombination and Sample Size
Expected frequency of recombinants after t generations of recombination in a random mating population
Higher frequency of recombinants in random mating population means smaller sample sizes required for high resolution mapping than linkage studies
c = 0.01
c = 0.001
c = 0.005
Association Mapping: Recombination and Sample Size
Number of markers depends on scale and pattern of LD
Small population size = large LD tracts = few markers required for QTL detection, but localization poor (dogs). Favorable situation for whole genome LD scan.
Large population size = small LD tracts =many markers required for QTL detection, but localization precise, maybe to level of QTN (Drosophila). Favorable situation for candidate gene re-sequencing.
LD patterns not constant across genome, but vary with local recombination rate, regions under natural selection
Knowing patterns of LD can guide experimental design
Strategies to Increase Power
Selective genotyping: Measure many individuals (several thousands), but only genotype the extreme tails
Selective genotyping and detect gene frequency differences between tails of distribution by pooling high and low samples (bulk segregant analysis) followed by next generation sequencing of pools
Strategies to Increase Genetic Diversity
Estimates of the number of QTL are minimum estimates:Experiments are limited in their power to separate closely linked
loci There must always be other loci with effects too small to be
detected by an experiment of a particular size The loci found are those differentiating the two strains compared Other loci would probably be found in other strains
Can increase genetic diversity by: Artificial selection for high and low trait values from large
heterogeneous base population, then inbreeding to construct parental stocks for mapping
Mapping population derived from crosses of several inbred strains, either RI lines or large outbred population maintained for many generations
Construction of near-isoallelic lines (NIL) backcross to one of parental strains select for markers flanking QTL and against markers flanking other
QTL
Fine-scale recombination backcross NIL to one of parental strains select for recombinants within NIL interval using additional
markers progeny test recombinant genotypes to map QTL to 2 cM or less.
Deficiency mapping (in Drosophila)
Change strategy from linkage to association mapping
High Resolution Mapping
Testing Position (cM)
0 20 40 60 80 100 120
Lik
eli
ho
od
Ra
tio
0
10
20
30
40
50
G1
G2
G3
Gt
Effect
+a
a
aa
aaa
+a+a+a
+a+a
QTL End Game: Proving QTL Corresponds to Candidate Gene
Supporting evidence: Potentially functional DNA polymorphisms Differences mRNA expression between alleles Expression of RNA/protein in relevant tissues Replicated associations in different populations Quantitative complementation QTL alleles and mutant allele
More concrete evidence: Create mutants in the candidate gene that affect the trait (transposon
tagging) Transgenic rescue Demonstrate functional differences between alleles by knocking-in
alternate alleles by homologous recombination