many loci effect on trait small combine together to affect phenotype environmental sensitivity...

Many loci

Effect on trait small

Combine together to affect phenotype

Environmental sensitivity

Introduction to QTL Mapping

Genetic Architecture of Quantitative Traits

Loci?

Distribution of effects on trait?

Distribution of pleiotropic effects (including fitness)

Distribution of context-dependent effects?Sex

EnvironmentGenetic background (epistasis)

Allele frequency?

Causal molecular variant?

QTL Mapping

QTL Mapping

QTL effects too small to be detected by Mendelian segregation

Need to map QTLs by linkage to marker loci with genotypes than can be unambiguously scored

Principle dates back to 1923, but abundant, polymorphic molecular markers only relatively recently available

Most studies use single nucleotide polymorphism (SNP) markers and insertion/deletion (indel) markers

Massively parallel sequencing technology is revolutionizing our ability to rapidly map QTLs

QTL Mapping

Genotype

Individual M1 M2 M3 M4 M5 M6 M7 M8 Phenotype

1 1 1 2 2 2 3 3 1 20

2 3 3 1 1 1 1 1 3 8

3 1 1 2 2 1 1 2 1 14

4 3 2 3 2 2 2 3 2 24

5 1 3 2 2 2 2 3 3 28

6 3 3 2 3 1 3 2 1 18

7 3 2 3 1 3 1 3 3 10

8 2 3 1 3 2 2 1 2 22

9 2 1 3 3 3 3 1 3 20

10 1 2 1 3 3 3 2 2 17

QTL Mapping Data

QTL Mapping: A Primer

Linkage Mapping Association Mapping

Two (or more) parental strains that differ genetically for trait

Population sample of individuals with genetic variation for the trait

Molecular markers that distinguish the parental strains

Molecular markers (whole genome or candidate gene)

Mapping population:Genotype all individuals for markers

Measure trait phenotype

Map QTLs by linkage to markers:Single marker analysis

Interval mapping

Map QTLs by Linkage Disequilibrium (LD) with markers

Mapping population:Genotype all individuals for markers

Measure trait phenotype

Map QTLs in pedigrees or populations derived from crosses of inbred lines

Map QTLs in individuals from an outbred population

Linkage Mapping: Find Parental Strains

0

10

20

30

40

50

60

70

80

90

100

Sta

rvati

on

Resis

tan

ce (

ho

urs

)

0

5

10

15

20

25

30

35

40

45

Lo

com

oto

r R

eact

ivit

y (s

eco

nd

s)

0

5

10

15

20

25

30

35

Ch

ill C

om

a R

eco

very

(m

inu

tes)

0

10

20

30

40

50

60

70

80

90

100

Lif

esp

an (

ho

urs

)

H2 = 0.56

H2 = 0.23

H2 = 0.58

H2 = 0.54

Linkage Mapping: Create Mapping Population

P1 P2

F1

BC1: F1 P1 BC2: F1 P2F2: F1 F1

RILs

M1 - - - A1 - - N1- - - - - - - O1 M2 - - - A2 - - N2- - - - - - - O2M1 - - - A1 - - N1 - - - - - - -O1 vs. M2 - - - A2 - - N2- - - - - - - O2

Test for: Linkage of a QTL (A) to individual markers (M, N, or O) = single

marker analysis

QTL in each interval in turn (M-N and N-O) = interval mapping

If there is a difference in trait mean between marker genotype classes, then a QTL is linked to the marker

Infer chromosomal locations and effects (a, d) of QTLs

Linkage Mapping: Test for Associations Between Markers and Trait

M Marker locusA QTLc recombination fraction between M and A

Line Cross Analysis: Single Markers

M A

c

Line Cross Analysis: Single Markers

M A

cGeneration Genotype ValueP1 M1 A1 / M1 A1 aP2 M2 A2 / M2 A2 –aF1 M1 A1 / M2 A2 d

F1 gametes:

Genotype FrequencyM1 A1 (1 – c)/2M2 A2 (1 – c)/2M1 A2 c/2M2 A1 c/2

Non-recombinant genotypes

Recombinant genotypes

Random mating of the F1 gives 10 possible F2 genotypicclasses.

The contribution of each marker genotype class to the F2 mean is obtained by multiplying the frequency of each genotype by its genotypic value, then summing within marker genotype classes.

We want actual means, which are got by dividing thecontribution to the F2 mean by the frequency of that marker class, which is the Mendelian segregation ratio of ¼ for the homozygotesand ½ for the heterozygotes.

Line Cross Analysis: Single Markers, F2 Mapping Population

Genotype Freq. Value Marker Total Contribution ActualClass Freq. to F2 Mean Mean

M1A1/M1A1 (1 – c)2/4 aM1A1/M1A2 c(1 – c)/2 d M1/M1 ¼ a(1 – 2c)/4 a(1 – 2c) M1A2/M1A2 c 2/4 –a + dc(1 – c)/2 + 2dc(1 – c)

M1A1/M2A1 c(1 – c)/2 aM1A1/M 2A2 (1 – c)2/2 d

M1/M2 ½ d[(1 – c)2 + c2]/2 d[(1 – c)2 +c2]M1A2/M2A1 c2/2 dM1A2/M2A2 c(1 – c)/2 –a

M2A1/M2A1 c2/4 aM2A1/M2A2 c(1 – c)/2 d M2/M2 ¼ – a(1 – 2c)/4 – a(1 – 2c)M2A2/M2A2 (1 – c)2/4 –a + dc(1 – c)/2 +2dc(1 – c)

F2 Genotypes With One Marker Locus, M, and a Linked QTL, A

The following two contrasts of marker class means are functions of a and d:

Contrast 1:

(M1/M1 – M2/M2)/2 = a(1 –2c)

Contrast 2:

M1/M2 – [(M1/M1 + M2/M2)/2] = d(1 –2c)2

This contrast, in combination with the first, therefore allows estimation of d/a, but will always be underestimated by (1 –2c)



In summary:

A significant difference in the mean value of a quantitative trait between homozygous marker genotype classes indicates linkage of a QTL and the marker locus.

Estimates of a and d/a from single marker analysis are confounded with recombination frequency, and will generally underestimate the true values by (1 –2c).

Example: The true effect is a = 1, d = 0. Expected estimates for a as a function of c:

c a0 10.1 0.80.25 0.50.5 0

With complete cross-over interference:c = c1 + c2

(True for c < 0.1 = 10 cM)

Interval Mapping Analysis

M N

c

A

c1 c2

Generation Genotype ValueP1 M1A1N1/M1A1N1 aP2 M2A2N2/M2A2N2 –a

F1 M1A1N1/M2A2N2 d

F1 gametes:Genotype FrequencyM1A1N1 (1–c)/2M2A2 N2 (1–c)/2M1A2N2 c1/2M2 A1N1 c1/2M1 A1N2 c2/2M2 A2N1 c2/2

Line Cross Analysis: Interval Mapping

Non-recombinant genotypes

Recombinant genotypes

Example: Back-cross (BC) mapping population:Tabulate BC genotypes, frequencies and means,

assuming no double recombination. Calculate expected marker genotype means.

F1 backcrossed to M1A1N1.

Gamete Freq. Value Marker Freq. Contribution to Actual Type Class BC Mean Mean

M1A1N1 (1–c)/2 a M1N1/M1N1 (1–c)/2 a(1–c)/2 a

M1A1N2 c2/2 a M1N1/M1N2 c/2 (ac2+dc1)/2 (ac2 + dc1)/c

M1A2N2 c1/2 d

M2A1N1 c1/2 aM1N1/M2N1 c/2 (ac1+dc2)/2 (ac1 + dc2)/c

M2A2N1 c2/2 d

M2A2N2 (1–c)/2 d M1N1/M2N2 (1–c)/2 d(1–c)/2 d

BC Genotypes With Two Linked Markers, M and N, and a linked QTL, A

In a manner similar to the single marker example, contrasts between backcross marker class means (γ and δ below) estimate the effects of the QTL.

In contrast to the single marker example, the map position relative to the flanking markers can also be estimated:

M1N1/M1N1 – M1N1/M2N2 = a – d = γ

M1N1/M1N2 – M1N1/M2N1 = (a – d)(c2 – c1)/c = δ


The estimate of a is unbiased only if d = 0, so recessive QTLs may not be detected.

This problem can be overcome by backcrossing to both parental lines, or by using an F2 design.

Note: c is assumed to be known, so c1 and c2 can be estimated:

δ/γ = (c2 – c1)/c = (c – 2c1)/c

and solve for c1.


Association Mapping: Collect Population Phenotypes and Genotypes

0

10

20

30

40

50

60

70

80

90

100

Sta

rvati

on

Resis

tan

ce (

ho

urs

)

0

5

10

15

20

25

30

35

40

45

Lo

com

oto

r R

eact

ivit

y (s

eco

nd

s)

0

5

10

15

20

25

30

35

Ch

ill C

om

a R

eco

very

(m

inu

tes)

0

10

20

30

40

50

60

70

80

90

100

Lif

esp

an (

ho

urs

)

H2 = 0.56

H2 = 0.23

H2 = 0.58

H2 = 0.54

Association Mapping

Association mapping utilizes historical recombination in random mating populations to identify QTLs, measured by linkage disequilibrium (LD)

LD is a measure of the correlation in gene frequencies between two loci.

Consider locus A with alleles A1 and A2 at frequencies p1 and p2 respectively, and locus B with alleles B1 and B2 at frequencies q1 and q2 respectively.

If the gene frequencies at these loci are uncorrelated, the expected frequency of each gamete type is the product of the allele frequencies at each locus separately.

The gamete types are called HAPLOTYPES because we describe the genetic constitution of a haploid gamete.

For two loci there are only 4 gamete types: A1B1, A1B2, A2B1 and A2B2.

Linkage Disequilibrium (LD)

Gamete Type Expected Observed (Haplotype) Frequency Frequency

A1B1 p1q1 = P11

A1B2 p1q2 = P12

A2B1 p2q1 = P21

A2B2 p2q2 = P22

Where p1 + p2 = 1 q1 + q2 = 1


If allele frequencies are uncorrelated, the population is in ‘linkage equilibrium’, and P11P22 - P12P21 = 0

If allele frequencies are non-randomly associated, the gamete frequencies are not the simple product of the allele frequencies, but depart from this by amount D

D is the coefficient of linkage disequilibrium


Gamete Types Expected Frequency Observed(Haplotypes) (Disequilibrium) Frequency

A1B1 p1q1 + D = P11

A1B2 p1q2 – D = P12

A2B1 p2q1 – D = P21

A2B2 p2q2 + D = P22

and P11P22 – P12P21 = D

A1B1

A2B2A2B2

A2B2A1B1

A1B1

A2B2

A2B2

A1B1 A2B2

A1B1

A1B1

A2B2A1B1A1B1

A2B2

Linkage Disequilibrium

A1B1

A2B2

A1B1

A1B1

A2B2

A2B2

A2B2

A1B2

A1B1

A1B2

A1B2A1B2

A2B1

A2B1

A2B1

A2B1

Linkage Equilibrium

Numerical value of D depends on gene frequencies at the two loci.

Sign of D is arbitrary for molecular markers; consider absolute value.

Highest value of D for p1 = p2 = q1 = q2 = 0.5, and gamete types A1B2 and A2B1 are missing (complete linkage disequilibrium); D is then 0.25.

Because of the dependence on gene frequency, values of D are typically scaled by the observed gene frequencies.


1. D'= D/Dmax

Dmax is the smaller of p1q2 or p2q1. This is because: P12 = p1q2 – D ≥ 0; D ≤ p1q2

P21 = p2q1 – D ≥ 0; D ≤ p2q1

Maximum values of D' is 1.

2. r2 = D2/p1p2q1q2

Expected value in equilibrium population is r2 = E(r2) = 1/(1 + 4Nc), where N is the effective population size and c is the recombination fraction between the two loci.

In principle one can use this relationship to estimate c, but r2 has very large statistical and genetic sampling variances, so in practice this relationship is not very useful.

Causes of LD:

Mutation (a new mutant allele is initially in complete linkage disequilibrium with all other loci in the genome)

Admixture between populations with different gene frequencies

Natural selection for particular combinations of alleles

Population bottlenecks (chance sampling of small number of haplotypes)


0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25 30

Dt/D

0

Generation, t

c = 0.001c = 0.005c = 0.01

c = 0.05

c = 0.1c = 0.5

D declines in successive generations in a random mating population by an amount which depends on the recombination fraction, c.

Dt = D0 (1 – c)t after t generations of random mating.

With unlinked loci and free recombination (c = 0.5) D is halved by each generation of random mating; with linked loci D decays more slowly.



Then

Now

Disequilibrium between pairs of loci in random mating populations depends on population history, but is

expected to be small unless the loci are very tightly linked.

Use molecular polymorphism and phenotypic information from samples of alleles from a random mating population to determine whether there is an association with the trait phenotype.

Can be done for candidate gene, QTL region, or whole genome.

Depending on the scale of LD, one can use LD for fine-mapping QTL, and even causal variants.

LD large in populations that have undergone recent bottlenecks in population size, from a founder event or artificial selection

LD small in large, near equilibrium outbred populations (e.g., Drosophila).

CAVEAT: Population admixture can cause false positive associations if marker frequencies and trait values are different between populations

Association Mapping

Association MappingF

req

uen

cy

Phenotype

Quantitative traits: Group data by genotype for each

marker Assess if there is a difference

between the mean of the trait between different alleles of a marker genotype

If so, the locus affecting the trait is in LD with the marker locus

Cases Controls

Categorical traits: Group data according to whether

individuals are affected or not affected

Determine if there is a difference in genotype frequencies or allele frequencies between cases and controls

If so, the locus affecting the trait is in LD with the marker locus

Association Mapping

Association mapping underestimates QTL effects unless the molecular marker genotyped is the casual variant

Let be the effect attributable to the causal variant, and a the estimated effect.

= [p(1 – p)/D]a, where p is the frequency of the polymorphic site and D is the LD between the causal QTN and the poylmorphic site associated with it.

D p(1 – p) (maximum p(1 – p) = 0.25), so a

t-tests, ANOVA, marker regressions or more sophisticated maximum likelihood (ML) methods can be used to assess differences in trait phenotype between marker genotypes.

The parental lines will differ at many loci affecting the trait of interest; therefore QTLs unlinked to the markers under consideration will segregate in the F2 or backcross generation.

Methods for dealing with multiple QTL simultaneously (e.g., composite interval mapping) reduce the variance within marker genotype classes and improve estimates of map positions and of effects.

Linkage Mapping: Statistical Considerations

Many markers are tested for linkage to a QTL in a genome scan. The number of false positives increases with the number of tests. With n independent tests, the level for each should be set to α/n (a

Bonferroni correction). The number of independent tests will be less than the number of

markers because of linked markers. Permutation tests are typically used to determine appropriate

experiment-wise significance levels, accounting for multiple tests and correlated markers.

Linkage Mapping: Statistical Considerations

Chromosome I

0 40 80 120

Lik

elih

oo

d R

ati

o

0

5

10

15

20

25

30

35

40

45

50

Chromosome II

Testing Position (cM)

0 40 80 120 160 200 0 40

Chromosome III

0 40 80 120 160 200 240 280 320 360

Lik

eli

ho

od

rat

io

Genotype Phenotype

Ind. M1 M2 M3 M4 M5 M6 M7 M8 O P1 P2 P3 P4

1 1 1 2 2 2 3 3 1 20 14 24 8 20

2 3 3 1 1 1 1 1 3 8 20 22 17 28

3 1 1 2 2 1 1 2 1 14 18 20 28 10

4 3 2 3 2 2 2 3 2 24 8 18 20 8

5 1 3 2 2 2 2 3 3 28 22 10 18 17

6 3 3 2 3 1 3 2 1 18 24 20 14 22

7 3 2 3 1 3 1 3 3 10 17 14 24 20

8 2 3 1 3 2 2 1 2 22 28 17 20 18

9 2 1 3 3 3 3 1 3 20 10 8 22 24

10 1 2 1 3 3 3 2 2 17 20 28 10 14

Permutation Test

–logP

Permutation Test

How large must the experiment be to detect a difference δ between the two homozygous marker genotypes?

For simplicity, assume the QTL is completely linked to the marker ( c = 0) and that a t-test is used to judge the significance of the difference of two marker class means.

n ≥ 2 (zα + z2β)2/(δ/σP)2

σP phenotypic standard deviation within marker-classes α false positive (Type I) error rate (0.05)β false negative (Type II) error rate (0.1)z ordinate of the normal distribution corresponding to its subscript

zα = 1.96 and z2β = 1.28

Linkage Mapping: Power and Sample Size

Linkage Mapping: Power and Sample Size

n = number per marker classN = number of total mapping populationFor strictly additive effects, FA

2 = 2pq*2FP2

Easy to detect QTLs with large effects Need large sample sizes to detect QTLs with moderate to small

effectsThe power to detect a difference in mean between two marker

genotypes depends on δ/σP; strategies to reduce σP can increase power (e.g., progeny testing, RI lines).

F2

BC

Linkage Mapping: Recombination and Sample Size

Number of individuals needed to detect at least one recombinant in an interval of size c (c = 100cM)

Number of marker genotypes needed to localize QTLs per 100 cM

Linkage Mapping: Power, Recombination and Sample Size

Large numbers necessary to detect QTL AND estimate location.

For an F2 design, need 336 individuals to detect QTL with large effect (δ/σP = 0.5) x 59 individuals to ensure the QTL is mapped to a 5 cM region = 19,824 individuals in total and 416,304 marker genotypes per 100 cM.

QTL mapping is in practice an iterative procedure, where QTLs are first mapped to broad genomic regions in a genome scan, followed by high resolution mapping to localize genes within each QTL region.

Genotyping by sequencing is changing this strategy, facilitating rapid, fine mapping of QTLs.

q = 0.1

q = 0.25

q = 0.5

Association Mapping: Power and Sample Size

q = frequency of rare allele LD mapping has the same power as linkage mapping in an F2

population for intermediate gene frequencies, but much reduced power as the frequency of the rare allele decreases (the number of homozygotes in the population is q2)

This calculation assumes the marker is the causal variant; even larger samples are necessary if the marker is in LD with the causal variant

Easy to detect intermediate frequency variants with large effects Hard to detect rare variants with small effects

Association Mapping: Recombination and Sample Size

Expected frequency of recombinants after t generations of recombination in a random mating population

Higher frequency of recombinants in random mating population means smaller sample sizes required for high resolution mapping than linkage studies

c = 0.01

c = 0.001

c = 0.005

Association Mapping: Recombination and Sample Size

Number of markers depends on scale and pattern of LD

Small population size = large LD tracts = few markers required for QTL detection, but localization poor (dogs). Favorable situation for whole genome LD scan.

Large population size = small LD tracts =many markers required for QTL detection, but localization precise, maybe to level of QTN (Drosophila). Favorable situation for candidate gene re-sequencing.

LD patterns not constant across genome, but vary with local recombination rate, regions under natural selection

Knowing patterns of LD can guide experimental design

Strategies to Increase Power

Selective genotyping: Measure many individuals (several thousands), but only genotype the extreme tails

Selective genotyping and detect gene frequency differences between tails of distribution by pooling high and low samples (bulk segregant analysis) followed by next generation sequencing of pools

Strategies to Increase Genetic Diversity

Estimates of the number of QTL are minimum estimates:Experiments are limited in their power to separate closely linked

loci There must always be other loci with effects too small to be

detected by an experiment of a particular size The loci found are those differentiating the two strains compared Other loci would probably be found in other strains

Can increase genetic diversity by: Artificial selection for high and low trait values from large

heterogeneous base population, then inbreeding to construct parental stocks for mapping

Mapping population derived from crosses of several inbred strains, either RI lines or large outbred population maintained for many generations

Construction of near-isoallelic lines (NIL) backcross to one of parental strains select for markers flanking QTL and against markers flanking other

QTL

Fine-scale recombination backcross NIL to one of parental strains select for recombinants within NIL interval using additional

markers progeny test recombinant genotypes to map QTL to 2 cM or less.

Deficiency mapping (in Drosophila)

Change strategy from linkage to association mapping

High Resolution Mapping

Testing Position (cM)

0 20 40 60 80 100 120

Lik

eli

ho

od

Ra

tio

0

10

20

30

40

50

G1

G2

G3

Gt

Effect

+a

a

aa

aaa

+a+a+a

+a+a

QTL End Game: Proving QTL Corresponds to Candidate Gene

Supporting evidence: Potentially functional DNA polymorphisms Differences mRNA expression between alleles Expression of RNA/protein in relevant tissues Replicated associations in different populations Quantitative complementation QTL alleles and mutant allele

More concrete evidence: Create mutants in the candidate gene that affect the trait (transposon

tagging) Transgenic rescue Demonstrate functional differences between alleles by knocking-in

alternate alleles by homologous recombination

many loci effect on trait small combine together to affect phenotype environmental sensitivity...

Documents