population and quantitative genetics – hs18glebert/download/popgen/popgen_glebebert_v1.pdf ·...

14
Population and Quantitative Genetics – HS18 v1.0 Gleb Ebert July 2, 2019 This document aims to summarize the lecture Population and Quantitative Genetics as it was taught in the autumn semester of 2018. It is heavily based on the slides and often contains passages verbatim. Unfortunately I cannot guarantee that it is complete or free of errors. You can contact me under [email protected] if you have any suggestions for improvement. The newest version of this summary can always be found on my website: http://www.glebsite.ch Contents 1 Molecular Markers, HWE, Genetic Variation 2 2 Genetic Drift 3 3 Populations 4 4 Mutations 6 5 Linkage disequilibrium and recombination 6 6 Neutral theory and coalescent 7 7 Quantitative traits 9 8 Phenotypic variation 9 9 Heritability 10 10 Response to selection 10 11 Inbreeding and heterosis 11 12 Formulas useful in quantitative genetics 14 1

Upload: others

Post on 10-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Population and Quantitative Genetics – HS18glebert/download/popGen/popGen_GlebEbert_v1.pdf · Population and Quantitative Genetics – HS18 v1.0 GlebEbert July2,2019 This document

Population and Quantitative Genetics –HS18

v1.0

Gleb Ebert

July 2, 2019

This document aims to summarize the lecture Population and Quantitative Genetics as it wastaught in the autumn semester of 2018. It is heavily based on the slides and often containspassages verbatim. Unfortunately I cannot guarantee that it is complete or free of errors. You cancontact me under [email protected] if you have any suggestions for improvement. Thenewest version of this summary can always be found on my website: http://www.glebsite.ch

Contents

1 Molecular Markers, HWE, Genetic Variation 2

2 Genetic Drift 3

3 Populations 4

4 Mutations 6

5 Linkage disequilibrium and recombination 6

6 Neutral theory and coalescent 7

7 Quantitative traits 9

8 Phenotypic variation 9

9 Heritability 10

10 Response to selection 10

11 Inbreeding and heterosis 11

12 Formulas useful in quantitative genetics 14

1

Page 2: Population and Quantitative Genetics – HS18glebert/download/popGen/popGen_GlebEbert_v1.pdf · Population and Quantitative Genetics – HS18 v1.0 GlebEbert July2,2019 This document

1 Molecular Markers, HWE, GeneticVariation

Population genetics apply Mendel’s laws and other geneticprinciples to populations. It studies genetic variation withinand between populations and species and the forces thatresult in evolutionary changes in populations and speciesthrough time. Population genetics are useful in studies ofevolutionary processes, conservation, medicine, agricultureand other fields.

1.1 Useful Formulas

mean x = 1n

∑ni=1 xi

variance Vx = 1n−1

∑ni=1(xi − x)2

standard deviation sd =√Vx

1.2 Genetic Variation

“Nothing in biology makes sense except in the light of

evolution”

— Theodosius Dobzhansky

Modern synthesis brought Mendelian genetics togetherwith Darwin’s theory of natural selection to help quantifyingthe genetic variation in natural populations.The classical view of genome organization was that wildtype alleles made up the whole genome with a few mutationsin between. The balanced view however said that for eachgene there are multiple alleles and that heterozygosity ispossible.

1.3 Genetic variation

Experimental methods for detecting genetic variation (inchronological order):

• Allozyme electrophoresis• DNA (Sanger) sequencing• Restriction fragment length polymorphism (RFLP)• Simple sequence repeats (SSR or microsatellites)• Amplified fragment length polymorphism (AFLP)• Single nucleotide polymorphisms (SNPs)• Next-generation sequencing (NGS)

1.3.1 Allozyme electrophoresis

Enzymes that differ in electrophoretic mobility as a resultof allelic differences in a single gene are called allozymes.Allozyme electrophoresis separates these. Allozymesunderestimate the levels of DNA polymorphism becausethey detect only a subset of existing amino acid replacementand they do not detect synonymous mutations. However,they may also overestimate polymorphisms because theyrepresent mostly group 1 enzymes (common in tissues andbody fluids) and enzyme polymorphisms may not be neutraland therefore not reflect polymorphism elsewhere in thegenome. Allozyme electrophoresis was replaced by DNAelectrophoresis.

1.3.2 Microsatellites

Microsatellites are highly polymorphic (in number of re-peats) markers that are widely used in animals, plants andfungi to assess genetic variation. They are also known asshort tandem repeats (STRs) or simple sequence re-peats (SSRs). Together with their longer cousins, theminisatellites, they are classified as variable number tan-dem repeats (VNTRs). Like allozymes they are codom-inant markers. Homozygotes can be distinguished fromheterozygotes.. . . GATCGA(GC)7TAGCCGAT. . .

1.3.3 Amplified Fragment Length Polymorphisms(AFLPs)

In AFLP DNA is first digested by one or more digestionenzymes. Then adapters that are specific to the half-sitesare ligated to the fragments. Some of these fragment arethen selectively amplified with two primers each that arecomplimentary to the adaptor and the restriction site re-spectively. The amplification products are then visualizedusing gel electrophoresis. AFLPs are dominant markers.Homozygotes cannot be distinguished from heterozygotes.

1.3.4 Types of Polymorphisms

A polymorphism that does not alter the amino acid se-quence is called synonymous. They are a consequence ofthe redundancy of the genetic code. Non-synonymous orreplacement polymorphisms change the amino acid. If apolymorphism is noncoding or silent, it does not affect nu-cleotides in coding regions. Nucleotide polymorphisms canbe devided into insertion/deletion polymorphisms (indels)and single-nucleotide polymorphisms (SNPs). A uniquecombination of linked genetic markers is often called a hap-lotype.

1.3.5 Genetic Variation in Natural Populations

A population is a group of interbreeding, same-speciesindividuals that exist together in time and space. If it israndom-mating (panmictic), the probability of matingbetween individuals of particular genotypes is equal to theproduct of their individual frequencies in the population.

2

Page 3: Population and Quantitative Genetics – HS18glebert/download/popGen/popGen_GlebEbert_v1.pdf · Population and Quantitative Genetics – HS18 v1.0 GlebEbert July2,2019 This document

1.4 The Hardy-Weinberg Principle

The Hardy-Weinberg principle describes allele frequenciesin diploid populations. It applies after one generation ofrandom mating. The principle assumes that allele frequencychanging factors (natural selection, drift, . . . ) are absent.At low frequencies the majority of a certain allele occurs inheterozygous individuals.

(p+ q)2 = p2 + 2pq + q2 = 1

1.5 De Finetti Diagram

Below the De Finetti diagram for one locus with two allelesis shown. Each corner and the corresponding line are a fre-quency coordinate of the respective genotype. The paraboladescribes HW expected genotype frequencies.

1.6 Estimating Allele Frequencies

Samples are taken from populations to estimate genotypeand allele frequencies (maximum-likelihood approach). In asample of N individuals with N11 individuals of genotypeA1A1, N12 of A1A2 and N22 of A2A2 (N11+N12+N22) theestimated genotype and allele frequencies are

P =N11

NH =

N12

NQ =

N22

N

p =N11 +

12N12

Nq =

N22 +12N12

N

1.7 Dominance

Dominance refers to the effect on the phenotype of oneallele relative to another recessive allele. To estimate the fre-quency of the recessive allele a, one has to assume that theproportion of genotype aa in the population equals N22

N= q2

and thus q =√

N22

N. We therefore have to assume that the

population is in HWE.Some rare (< 10%) recessive alleles cause human diseaseslike albinism, cystic fibrosis or sickle-cell anemia. The fre-quency of carriers of this allele is H = 2pq.

1.8 Testing Hardy-Weinberg Proportions

The chi-squared test quantifies the quality of the fit betweenobserved and expected genotype frequencies (with k = #genotype classes).

χ2 =k

i=1

(obs− exp)2

exp

The resulting value and the degrees of freedom df = k− 1−(# parameters estimated from the data) give us the proba-bility for the difference between observed and expected or amore extreme case.The sample size should be over 50 and the expected numberin all classes should be greater than 5. Otherwise an exacttest should be used.

1.9 Measures of molecular genetic variation

• HW-expected heterozygosity / gene diversityHE = 1−∑n

i=1 p2i

• Observed heterozygosity (no HW)HO =

∑ni<j Pij

• Effective number of alleles (good measure of trueallelic diversity): ne =

11−H

• average number of pairwise differences per siteπ =

∑ pij

# comparisons

2 Genetic Drift

Genetic drift is the random alteration of allele frequen-cies that results from the sampling of gametes from gener-ation to generation in finite populations. It has the sameexpected effect on all loci in the genome.

2.1 Wright-Fisher Model

The model is a simplified view of reproduction, sampling 2Ngametes from an infinite gamete pool. Major assumptionsinclude:

• equal sex ratio • non-overlapping generations• equal fitness • constant population size

2.2 Allele fixation

The chances of fixation are equal to the initial allele fre-quency. Over replicate generations, the mean allele fre-quency does not change, but the distribution of the allelefrequencies changes.In very large populations, random changes in allele frequencywill be minor, but in small populations, genetic drift maycause large fluctuations in allele frequencies across genera-tions and can result in chance fixation or loss of alleles andincreased autozygosity (IBD).

3

Page 4: Population and Quantitative Genetics – HS18glebert/download/popGen/popGen_GlebEbert_v1.pdf · Population and Quantitative Genetics – HS18 v1.0 GlebEbert July2,2019 This document

3 Populations

The census size is the total amount of individuals of in apopulation. The breeding population size is the numberof sexually mature individuals. The effective populationsize is thought to be the appropriate measure for evolution-ary studies it is often quite different (typically lower) thanthe breeding population size due to complicating factorssuch as variation in sex ratio, offspring number per individ-ual (family size) and numbers of breeding individuals acrossgenerations.Ne refers to an ideal population of size N in which all par-ents have an equal expectation of being the parents of anyprogeny individual (Poisson-distributed family size). It canalso be understood as the size of an idealized Wright-Fisherpopulation that would produce the same amount of inbreed-ing, allele frequency variance, or heterozygosity loss as the(empirical) population under study.

Ne =4NfNm

Nf +Nm

3.1 Bottleneck & founder effect

A Bottleneck is a period during which only few individualssurvive to continue the existance of the population. Thefounder effect describes a population that has grown froma few founder individuals. Populations descended from asmall founder group may have low genetic variation or bychance have a high or low frequency of particular alleles(consequences of low N).Ne is determined by the harmonic mean across generation:1Ne

= 13

(

1100 + 1

10 + 1100

)

= 0.04 ⇒ Ne = 25

3.2 Molecular data

In practice, information about long-termNe is available fromestimates of the population mutation parameter θ = 4Neµ.One of the two commonly used estimators of the param-eter θ is π, the average pairwise nucleotide diversity.High nucleotide diversity implies high Ne and low nucleotidediversity implies low Ne. Examples: Ne,Human ∼ 5 ∗ 104,Ne,D. melanogaster > 106

The geographic spread of a species is inversely correlatedwith genetic diversity.

3.3 The Orthodox Paradigm

When populations are subdivided because of geographical,ecological, or behavioral factors, genetic connectivity amongsubpopulations is often reduced and depends on the amountof genetically effective gene flow. Gene flow indicates move-ment of individuals or gametes between groups that resultsin genetic exchange.

The upper graph shows alleles after mating with a randomindividual of the whole population (99x99 mating neigh-bourhood). The mating neighbourhood of the lower graphis only 3x3.

3.4 F-statistics

F -statistics are a measure of the deficit of heterozygotesrelative to expected Hardy-Weinberg proportions in thespecified base population. The F parameters are thus in-breeding coefficients for different specified base populations.FST is also known as fixation index and ranges from 0(all subpopulations have equal allele frequencies) to 1 (allsubpopulations are fixed for one or the other allele). F isalso widely used as a measure of allelic differentiationbetween subpopulations, regardless of the number ofalleles. Two populations can have the same FST while nothaving any allels in common. There is a lack of distinctionbetween fixation and differentiation.

3.5 Wahlund’s Principle

Population substructuring is not always obvious, and as aconsequence, a sample may sometimes consist of individ-uals from different subpopulations. If subpopulations arelumped together and there are differences in allele frequen-cies among these subsamples, there will be a deficiency ofheterozygotes and an excess of homozygotes, even ifHW proportions exist within each subsample. The differencebetween expectation and reality is the effect size.

Initial subpopulations Fused population

Allele freq. q 0.4 and 0.0 0.4+0.02

= 0.2

Var. in q(0.4−0.2)2+(0.0−0.2)2

2

= 0.040

Freq. of aa q2 = 0.16+0.02

= 0.08 0.22 = 0.04

Freq. of Aa p2 = 0.36+1.02

= 0.68 0.82 = 0.64

4

Page 5: Population and Quantitative Genetics – HS18glebert/download/popGen/popGen_GlebEbert_v1.pdf · Population and Quantitative Genetics – HS18 v1.0 GlebEbert July2,2019 This document

HI = 1n

∑ni=1 Hi The average observed

heterozygosity within eachsubpopulation.

HS = 1n

∑ni=1 2piqi The average expected

heterozygosity of subpopulationsassuming random mating withingeach subpopulation.

HT = 2pq The expected heterozygosity of thetotal population assuming randommating within subpopulations andno divergence of allele frequenciesamong subpopulations.

Wright’s fixation indices measure the consequences ofpopulation subdivision. For two levels of population organi-zation they look as follows (IS = individual subpopulation,ST = subpopulation total, IT = individual total):

FIS = 1− HI

HSThe average difference betweenobserved and HW-expectedheterozygosity within subpopulationsdue to nonrandom mating.

FST = 1− HS

HTThe difference between the averageexpected heterozygosity ofsubpopulations and the expectedheterozygosity of the total population.Reduction in heterozygosity due todivergence in allele frequency amongsubpopulations.

FIT = 1− HI

HTThe average difference betweenobserved heterozygosity withinsubpopulations and the expectedheterozygosity of the total population,due possible to nonrandom mating andallele frequency divergence amongsubpopulations.

3.6 General island model

The general island model describes interactions betweenmultiple demes. A deme is a local group of individualsfrom the same taxon that interbreed with each other andshare a gene pool. In the graph below, N is the number ofindividuals per deme and m is the proportion of immigrants.

Wright showed, that at equilibrium between drift andgene flow (for small m) FST ≈ 1

4Nm+1

3.7 Continent island model

This model deals with unidirectional gene flow. Real-worldexamples include species on islands with nearby large landmasses, or aquatic species in ponds with a nearby lake asthe source of gene flow. An example is the hybridizationbetween red wolves with coyotes. Depending on the mat-ing rate of red wolves with coyotes, red wolf ancestry willdisappear sooner or later.

3.8 Stepping-stone model

For the (linear) stepping-stone model we assume that sub-populations are arranged in a one-dimensional spatial pat-tern and gene flow is restricted to adjacent subpopulations(m/2). A more generally applicable version is the two-dimensional stepping-stone model, with migrants being ex-changed between the 4 adjacent demes (m/4). Stepping-stone structure leads to isolation by distance.

3.9 Jost’s D

DJost is a measure of relative differentiation.

HST =HT −HS

1−HS

=⇒ DJost =HT −HS

1−HS

(

n

n− 1

)

Applying DJost to the finite-island model:DJost ≈ µn

mfor moderate n.

n N m µ GST D

5 100 0.01 0.001 0.127 0.2825 1′000 0.014 0.2825 10′000 0.001 0.28210 10′000 0.002 0.46920 10′000 0.002 0.65140 10′000 0.002 0.79380 10′000 0.002 0.886

160 10′000 0.002 0.940

DJost = 1− Jbetween

Jwithin

When does this really matter?

• SNPs in pairwise comparisons: possibly little• Markers with high mutation rates: a lot• Theory development hampered by continued relianceon FST : role of degree of population subdivision (ndemes) and µ?

5

Page 6: Population and Quantitative Genetics – HS18glebert/download/popGen/popGen_GlebEbert_v1.pdf · Population and Quantitative Genetics – HS18 v1.0 GlebEbert July2,2019 This document

4 Mutations

Mutations are the original source of genetic variation.They may involve changes in a single nucleotide, part ofa gene, part of a chromosome, a whole chromosome, orentire sets of chromosome. Mutations can be induced byspecific mutagens – UV light, chemicals, radiation. Suchspecific mutagens typically cause certain types of mutations.For spontaneous mutations the immediate cause for themutation is unkown.

4.1 Transposable elements

Transposable elements are pieces of DNA that are capableof moving and replicating themselves withing the genomeof an organism, often causing spontaneous visible mutants.

4.2 Fitness effets of mutations

Relative fitness of a given mutant depends on the environ-ment and the alleles at other loci, i.e. the genetic back-ground. The distribution of fitness effects is approximatelybimodal – most mutations are either very deleterious (i.e.they cause lethality or near lethality) or neutral / nearlyneutral. Advantageous mutations are presumably veryrare but important.

4.3 Fate of a single mutation

When a new mutation occurs it is the only copy in the entirepopulation and a single individuals is heterozygous for themutation: A1A2. If coalescence is assumed the chance thata new allele is fixed is 1

2Ne. Conversely, the probability that

it is lost is 1− 12Ne

. In the real world the expected time tofixation for a rare allele is T1(p) ∼ 4Ne generations.

4.4 Infinite-alleles model

The IAM proposes that mutation will increase the numberof alleles and genetic drift will reduce it. It assumes thateach mutation creates a new, unique allele. The expectedequilibrium heterozygosity for the infinite-alleles neutralmodel is

He =4Neµ

4Neµ+ 1=

θ

θ + 1

θ is the population mutation parameter and is definedas θ = 4Neµ. The equilibrium assumes that the distribu-tion of alleles remains constant but allele frequencies andidentities change constantly.

4.5 Stepwise mutation model

The SMM assumes that mutation occurs only to adjacentstates (e.g. the number of repeats does not increase bymore than one with each mutation). In contrast to the IAMmutation may produce alleles that are already present inthe population. Thus both generation of variation and theequilibrium level of heterozygosity should be lower.

4.6 Infinite-site model

The infinite-sites model is usually used when working withDNA sequences. Every nucleotide mutation is assumed tooccur at a previously unmutated site. Thus every segregat-ing (polymorphic) site can only be two of four nucleotides.

4.7 Finite-site model

The finite-sites model allows for mutations to occur at al-ready mutated sites. Multiple mutations can obscure pat-terns of relatedness (leading to homoplasy).

5 Linkage disequilibrium andrecombination

Gametic phase disequilibrium or linkage disequilibrium (LD)is the nonrandom association of alleles at different loci intogametes (haplotypes). Gamete frequencies and disequilib-rium can be influenced by selection, inbreeding, geneticdrift, gene flow and mutation. The level of recombinationbetween loci and Ne (population recombination param-eter ρ = 4Nec with recombination rate c) strongly affectthe extent of linkage disequilibrium . Linkage disequilibriumcan occur between closely linked as well as unlinked loci(even across chromosomes).

6

Page 7: Population and Quantitative Genetics – HS18glebert/download/popGen/popGen_GlebEbert_v1.pdf · Population and Quantitative Genetics – HS18 v1.0 GlebEbert July2,2019 This document

5.1 Measuring linkage disequilibrium

Assume a large random-mating population with discretegenerations segregating for two allels each at loci A (allelesA1 and A2) and B (alleles B1 and B2). Gamete frequenciesare given by xij values. The frequency of alleles A1 and B1

are given by p1 and q1 respectively. Additionally p1+p2 = 1,q1 + q2 = 1 and

xij = 1.

Gamete Frequency

A1B1 x11

A1B2 x12

A2B1 x21

A2B2 x22

Allele Frequency

A1 p1 = x11 + x12

A2 p2 = x21 + x22

B1 q1 = x11 + x21

B2 q2 = x12 + x22

If the association between alleles within gametes is randomthe frequency of each gamete is equal to the product ofthe frequencies of the alleles it contains (left). Non-randomassociation of alleles lead to a daviation D that is added tothe expected frequencies (right). x11 and x22 are so-calledcoupling gametes while x12 and x21 are repulsion gametes.

Random association Non-random association

x11 = p1q1 = p1q1 +Dx12 = p1q2 = p1q2 −Dx21 = p2q1 = p2q1 −Dx22 = p2q2 = p2q2 +D

D is called the linkage disequilibrium parameter and isa measure of the deviation from random association betweenalleles at different loci.

D = x11 − p1q1 (observed− expected)

D = x11x22 − x12x21

D is thus the product of the frequencies of the couplinggametes minus the product of the frequicies of the repul-sion gametes. D has a maximum value of 0.25 whenthere are only coupling gametes (x11 = x22 = 0.5) and aminimum value of –0.25 when there are only repulsiongametes (x12 = x21 = 0.5). The decay of LD is proportionalto the population recombination parameter ρ = 4Nec.Changes in gamete frequencies can take place only throughrecombination (with rate c; cmax=0.5) in double heterozy-gotes.

Genotypes Gametes

A1B1/A1B1 A1B1

A1B1/A1B2 A1B1, A1B2 (50% each)A1B1/A2B2 A1B1, A2B2, A1B2, A2B1 (25% each)

Dt = (1− c)tD0 D′ =D

Dmaxr2 =

D2

p1p2q1q2

t being the number of generations. Other measures of link-age disequilibrium include D′, which allows comparisons ofLD levels irrespective of how close we are to equilibrium,and r2, which is the squared allele frequency correlationwithin gametes (range = [0, 1]) and takes allele frequencydifferences into account.

5.2 Population admixture

Population admixture can generate strong gametic disequi-librium when source allele frequencies are divergent. Thisis also called the two-locus Wahlung effect. The twopopulations are assumed to be at their respective gameticequilibrium and the mixture population consists of an equalnumber of gametes from the two source populations.

Gam. / D Gam. freq. Pop. 1 Pop. 2 Mix

A1B1 g11 0.01 0.81 0.41A2B2 g22 0.81 0.01 0.41A1B2 g12 0.09 0.09 0.09A2B1 g21 0.09 0.09 0.09D 0.0 0.0 0.16D′ 0.0 0.0 0.16/0.25

= 0.64

6 Neutral theory and coalescent

6.1 Neutral theory of molecular evolution

The neutral theory of molecular evolution states that geneticvariation is primarily influenced by mutation generating vari-ation and genetic drift eliminating it. Different molecularvariants have almost identical relative fitnesses, i.e. they areneutral with respect to each other. The actual definitionof selective neutrality depends on whether changes in allelefrequency are primarily determined by genetic drift – whens < 1

2N (with s being the selection coefficient). The neu-tral theory has provided the null hypothesis for examiningthe amount and pattern of molecular genetic variation. Itwas later generalized to form the nearly neutral theory,which states that |2Nes| ≈ 1 (context-dependent “weakselection”).

Selectively neutral mutations take an average 4Ne individu-als to become fixed and the time between such fixations ison average 1

µ(µ is the mutation rate).

The dwell time of new mutations under directional selection(top) and balancing selection (bottom).

7

Page 8: Population and Quantitative Genetics – HS18glebert/download/popGen/popGen_GlebEbert_v1.pdf · Population and Quantitative Genetics – HS18 v1.0 GlebEbert July2,2019 This document

6.1.1 Selective Sweeps

Selective sweeps happen when a beneficial allele carriesother neutral alleles close to it along through hitchhiking.It results in less diversity in the beneficial alleles vicinity.

6.2 Coalescent theory

Coalescent events mark the timepoint of the most re-cent common ancestro (MRCA) of two instances in apopulation.

E(Tk) =2N

k(k−1)2

Tk is the expected time in which there are k lineages

6.2.1 Site frequency spectrum

A singleton is a mutation that occurs only once in a pop-ulation. A doubleton occurs twice. The graph below iscalled a site frequency spectrum (SFS)

6.2.2 Effects of exponential population growth andshrinkage

6.2.3 Watterson’s estimator of the populationmutation parameter

The population mutation parameter is θ(= 4Neµ. Watter-son’s estimator is defined as follows:

θW =S

∑n−1i=1

1i

θW (θS) only depends on the number of segregating sitesS, taking into account the number of sampled sequencesn. Dividing by the total sequence length in bp yields theper-site θW .

6.2.4 Tajima’s D

Tajima’s D statistic is a test of selection or non-constant Ne

d = π − θW D =d

V ar(d)1

2

• θW and π should estimate the same parameter• E(D) ≈ 0 under neutrality and constant Ne

• excess of low-frequency polymorphism:θW > π, D < 0

• excess of intermediate-frequency SNPs:θW < π, D > 0

θπ ≈ θW θπ < θW θπ > θW

D ≈ 0 D < 0 D > 0

Differences in the shape of genealogies are the basis ofTajima’s D test. Changes in Ne over time change the prob-ablility of coalescence over time as well. SFS’ depend onthe change of Ne over time. E.g. population growth favourssingletons compared to a constant Ne. The SFS for non-synonymous SNPs is more biased towards singletons thanthe one for synonymous SNPs, as the former are selectedagainst. This phenomenon is called purifying selection.

8

Page 9: Population and Quantitative Genetics – HS18glebert/download/popGen/popGen_GlebEbert_v1.pdf · Population and Quantitative Genetics – HS18 v1.0 GlebEbert July2,2019 This document

7 Quantitative traits

“Quantitative characters are those differences between

individuals that are of degree rather than of kind, that are

quantitative rather than qualitative.”

— Falconer and MacKay

Nowadays the majority of improvements in yield of agri-cultural products are based on breeding for quantitativetraits. The three leading causes of mortality in industri-alized nations, heart disease, cancer and diabetes, are allquantitative traits. The response to infectious diseases isa quantitative trait as well. Examples from evolutionarybiology and ecology include beak size in Darwin’s finches aswell as the changing migration habits in response to climatechange of the European blackcap.The majority of traits under selection are quantitative andthe alleles at all loci contributing to the phenotypic variationact predominantly additively.

7.1 Categories of quantitative traits

• Continuous traits show an uninterrupted gradientfrom one phenotype to the next (e.g. height).

• Categorical traits have their phenotype determinedby counting (e.g. number of offspring).

• Treshold traits cause only two or a few phenotypicclasses (e.g. diabetes).

7.2 Genetic basis of quantitative traits

In Mendelian traits described by the dominance modelone allele A is contributing the entire phenotypic difference(dominance effect). The additive model says that each alleleis contributing a part of the phenotypic variance (additiveeffect). This is also called the multiple factor hypothe-sis.If n is the number of genes involved in a quantitative trait,then (2n+ 1) will determine the total number of phenotypeclasses.

7.3 Basic statistics for quantitative genetics

The central limit theorem states that if the sum of thevariables has a finite variance, then it will be approximatelynormally distributed. Almost any set of measurements willfollow a normal distribution if enough measurements aretaken.Variance is calculated as σ2 =

∑(x−x)2

n. Standard devi-

ation is defined as σ =√∑

(x−x)2

n.

68% of the distribution is within 1 standard deviation of themean, 95% are within 2 and 99.7% are within 3 standarddeviations. Mean and standard deviation are enough todescribe a normal distribution. The lower the variance, thenarrower the bell-shaped normal curve.

8 Phenotypic variation

The phenotype itself is the sum of a genetic and an en-vironmental component. The environment contributes tophenotypic variation in quantitative traits.

P = G+ E =⇒ VP = VG + VE

The environment does not however have a constant contri-bution to the phenotype at all times.

VP = VG + VG×E

The norm of reaction is a pattern of phenotypes under avariety of environmental conditions (environmental distribu-tion is transformed into the phenotypic distribution). Manyactual norms of reaction are non-additive.

VP = VG + VG + VG×E

8.1 Genotypic variation

Genotypic variation can be divided into components

VG = VA + VD + VI

⇒ VP = VA + VD + VI + VE + VG×E

8.1.1 Additive genetic variance

VA is the porportion of the total genotypic variance VG

caused by the sum of phenotypic effects of alleles whenthey are assembled into genotypes. When gene action isadditive, VA of a population depends on allele frequencies.It’s higher when alleles are at intermediate allele frequenciesthan when they are near fixation of loss.

8.1.2 Dominance genetic variance

VD is the proportion of VG caused by the deviation ofgenotypic values from their values under additive geneaction caused by the combination of alleles assembled intoa single-locus genotype.Consider a single locus with two alleles, A1 and A2. Callgenotypic values as follows: A1A1 = −a, A2A2 = +a andA1A2 = d. The midpoint between +a and −a is 0.

• No dominance (d = 0): A1A2 is at the midpoint.• A1 is dominant to A2: d > 0• A2 is dominant to A1: d < 0• Dominance is complete: A1A2 = A1A1 or A2A2

• Over-dominance: A1A2 < A1A1 or > A2A2

8.1.3 Population mean

The sum M is both the mean genotypic and phenotypicvalue for the population.

Genotype Value Freq.×Val.

A1A1 +a p2aA1A2 d 2pqdA2A2 −a −q2a

M = a(p− q) + 2dpq

9

Page 10: Population and Quantitative Genetics – HS18glebert/download/popGen/popGen_GlebEbert_v1.pdf · Population and Quantitative Genetics – HS18 v1.0 GlebEbert July2,2019 This document

8.1.4 Epistatis or interaction genetic variance

VI is the proportion of VG due to the deviation of genotypicvalues from their values under additive gene action causedby interactions between and among loci.

9 Heritability

Heritability is the proportion of phenotypic variance in apopulation that is due to genetic differences. It is not thesame for a given trait in different environments. Heritabilitydoes not say anything about what genes influence a pheno-type. It can only explain the amount of genetic variationthat causes phenotypic variation.

9.1 Broad Sense Heritability

Values for BSH range from 0 to 1:

H2 = VG/VP

To calculate BSH, one can take two approaches:1) Fix the genotype to estimate VE (selfing organisms,

clones, monozygotic twins) ⇒ VP = VE .Measure VP in many genetically different individualsin the same environment, then obtain VG by subtrac-tion: VG = VP − VE

2) Inbreed parents to homozygosity and calculate VP inP1, P2, F1 and F2 generations. As inbred lines aregenetically uniform, VP = VE .

VP1 = VP2 = VF1 = VE

VF2 = VP = VG + VE

VG = VF2 = VE

9.2 Narrow Sense Heritability

The ratio VA/VP expresses the extent to which phenotypesare determined by the alleles transmitted from the parentsand is called the heritability in the narrow sense, or simplythe heritability.

h2 = VA/VP ⇒ h2 =VA

VA + VD + VI + VE

Regression analysis is used to quantify the relationship be-tween variables that are correlated (e.g. the relationshipbetween height of fathers and sons). The regression line canbe represented with the equation:

y = a+ bx

where the x and y values represent the two variables, bis the slope of the line, also called the regression coeffi-cient, and a is the y-intercept. b can be calculated usingthe following equation:

bxy =Cov(x, y)

V (x)

• If b = 1, VA is the only component that influencesvariation

• If b = 0, VA does not influence variation• If b is between 0 and 1, VA and other components

influence variation

10 Response to selection

10.1 Phenotypic response to selection

10.1.1 Selection differential

The selection differential S describes the strength of selec-tion. It is the difference between the mean of the selectedparents µ∗ and the phenotypic mean of the initial popu-lation µ. The selection differential can be interpreted asthe within-generation change in phenotypic mean dueto selection.

S = µ∗ − µ

10.1.2 Selection intensity

The selection differential S is not particularly informativewhen trying to compare the strength of selection on differenttraits and/or in different populations. A much more usefulmeasure is the selection intensity i, which is the selectiondifferential expressed in fractions of phenotypic standarddeviations.

i = S/σZ

10.1.3 Truncating selection

Under truncating selection,the upper- or lowermostfraction p of a populationis selected to reproduce.Following from the proper-ties of the normal distribu-tion a good approximationof the intensity i is:

i ≃ 0.8 + 0.41 ∗ ln(

1

p− 1

)

10

Page 11: Population and Quantitative Genetics – HS18glebert/download/popGen/popGen_GlebEbert_v1.pdf · Population and Quantitative Genetics – HS18 v1.0 GlebEbert July2,2019 This document

10.1.4 Response to selection

The response to selection R describes the difference betweenthe mean phenotypic value of the original population µ andthe mean of the next generation µ0 that originated fromthe selected parents by random mating. The between-generation change in the mean due to the reprodcutionof the selected parents is:

(observed) R = µ0 − µ

10.2 Genetic response to selection

Selecting a fraction of the phenotypes means selecting afraction of the genotypes / alleles of the populations if thecharacter is genetically determined. If the selected parentsmate randomly and their offspring shows a change in allelefrequencies, evolution took place.

10.2.1 Breeders’ equation

The response to selection R depends on the strength ofwithin-generation selection S and on the fraction of theoffspring’s phenotypic value that can be predicted from theparental value, i.e., the heritability of the character.

(expected) R = h2S or R = bOPS

This relationship is often called the breeders’ equationand shows that the heritability of a character is the linkbetween the within-generation change S and the between-generation response R.

10.2.2 Fisher’s fundamental theorem of naturalselection

“The rate of increase in fitness of any organism at any

time is equal to its genetic variance in fitness at that time”

— Ronald Fisher

Populations will respond to selection as long as there isadditive genetic variance on which to act. h2 = VA/VP so ifVA = 0 then h2 = 0 and therefore R = 0 because R = h2S.

10.3 Asymmetrical responses to selection

Drift can cause the cumulative response in one directionto be greater than the other. This often cannot explain arepeated bias in response in one direction across replicatelines and must then be rejected as a null hypothesis. Ifthere is stronger natural selection for the trait in one direc-tion than the other, then natural selection will aid artificialselection in one direction and hinder it in the other.

10.4 Long-term responses to selection

The outcome of selection over a long period is unpredictablefor many reasons. First the outcome depends on the proper-ties of the individual genes contributing to the response andthis cannot be determined by observation at the outset. Sec-ond, mutation produces new variation whose nature cannotbe predicted. Without the creation of new variation by mu-tation, the response to selection cannot continue indefinitely.Eventually all segregating genes in a population will come tofixation by the selection or accompanying inbreeding. Theresponse is expected to slowly diminish and eventually cease.At this point, the population is at its selection limit.

11 Inbreeding and heterosis

11.1 Inbreeding

Inbreeding is the mating of closely related individuals (e.g.sibling, cousins, self-fertilized organisms). It tends to in-crease the number of individuals in a population that arehomozygous for a certain locus und therefore makes reces-sive traits appear more often. This overall decrease in fitnessis called inbreeding depression. In other words, inbreed-ing is non-random (assortative) mating that results indeviations from HW-expectations. Self-fertilization is themost extreme form of inbreeding. Complete self-fertilizationresults in only three mating types (three genotypes on adiploid locus).A sort of opposite of inbreeding is outcrossing, where twobreeds are crossed. Neither inbreeding nor outcrossing re-sult in allele frequency changes but both change genotypefrequencies.Even though the loss of heterozygosity leads to a loss ofgenetic diversity and the risk of heritable diseases increases,inbreeding is used to amplify desired traits in plants andanimals.

11.2 Inbreeding coefficient

The inbreeding coefficient F is the probability for two allelesof a homozygote to be identical by descent (IBD), or inother words that two alleles originate from he same ancestralallele. In a random-mating population F = 0 while in acompletely inbred population F = 1.

11.3 Inbreeding vs. random mating

Take a population with alleles A (frequency = 0.5) and a(frequency = 0.5). If random mating occurs and all otherinfluences on allele frequencies are ignored, the populationwill stay at AA = 0.25, Aa = 0.5 and aa = 0.25. Howeverafter only one generation of inbreeding (only individualswith the same genotype mate) allele frequencies change tothe following: AA = 0.375, Aa = 0.25 and aa = 0.375.With each further inbred generation the Aa genotype willbecome less and less common until it disappears from thepopulation.

11

Page 12: Population and Quantitative Genetics – HS18glebert/download/popGen/popGen_GlebEbert_v1.pdf · Population and Quantitative Genetics – HS18 v1.0 GlebEbert July2,2019 This document

Genotype Withinbreedingcoefficient F

With F = 0(randommating)

With F = 1(completeinbreeding)

AA p2(1−F )+pF p2 p

Aa 1pq(1− F ) 2pq 0

aa q2(1−F )+qF q2 q

pF and qF are autozygous (from the same parent) alleles,while the rest are allozygous (from different parents).

Possible effects of inbreeding include

• Reduced fertility both in litter size and sperm viability• Higher infant and child mortality• Increased occurence of genetic disorders• Smaller adult size• Fluctuating facial asymmetry• Loss of immune system function• Increased cardiovascular risks

11.4 Changes of mean trait values

Inbreeding changes the mean trait value of quantitativetraits.

µF = µ0 − 2dpqF = a(p− q) + 2dpq − 2dpqF

The mean value will only change if d 6= 0, i.e. there is domi-nance. When looking at a single locus, the mean value willincrease or decrease, depending on whether d > 0 or d < 0,to be closer the value of recessive alleles. The magnitudeof the change depends on the allele frequency, being thegreatest when p = q = 0.5.

11.5 Dominance vs. overdominance

The size of the circles depicts the expression levels.

Scenario A shows the dominance hypothesis. Allele Ais dominant while a is both recessive as well as deleteri-ous. The superiority of hybrids, also called heterosis, isattributed to the suppression of the undesirable a allele. Ifthis hypothesis is the main cause for the fitness advantage,fewer genes should be under-expressed in the heterozygousoffspring compared to their parents. Expression levels forany heterozygous gene should also be comparable to the onesfrom the dominant homozygous ancestral gene. Inbreedingreduces genetic variability and increases the chance of anindividual being homozygous for allele a. The genetic vari-ance for fitness is caused by rare deleterious alleles that are(partly) recessive. They persist in populations because ofrecurrent mutation. Most copies of these alleles in the basepopulation are in heterozygotes. Inbreeding then increasesthe frequency of homozygotes causing inbreeding depression.

Aa = AA > aa

Scenario B shows the overdominance hypothesis. Hereheterosis manifests in the heterozygote Aa having inreasedfitness over both its homozygous parents. This can hap-pen if two inbred strains are crossed. If this hypothesis isthe main cause for the fitness advantage, the heterozygousoffspring should show over-expression in certain genes overtheir homozygous parents. Since some inbred lines havemeans for fitness traits equal to the base population, theoverdominance hypothesis cannot be generally true.

Aa > AA > aa

In both scenarion the descendants of the original parentstend to have higher heterozygosity due to selection. Themain difference lies in the impossiblity of obtaining homozy-gotes as vigorous as heterozygotes if single-gene overdomi-nance is important to inbreeding depression.

11.6 Heterosis

To recap, heterosis is the increase in fitness of hybrids be-tween two inbred lineages. It is the reverse of inbreedingdepression. Heterosis works against fixation and leads tomore genetic variability. Populations that exhibit heterosisare better suited to adaptation. This is often exploited bybreeders to enhance productivity of plants or animals. Someplants even have alleles that are lethal when homozygousso only heterozygous individuals survive.

HF1=

n∑

i=1

(δpi)2di

Heterosis in F1 depends on dominance. If d = 0 then no in-breeding depression and thus no heterosis is possible. Again,as with inbreeding depression, directional dominance isrequired for heterosis. If some loci are dominant in one di-rection and some in the opposite one, their effects will tendto cancel each other out and no heterosis may be observed.The absence of heterosis is not sufficient to conclude thatno dominance exists. H is proportional to the quare of thedifference in gene frequency between populations. It isgreatest when alleles are fixed in one population and absentin the other (so that |δi| = 1) or in other words, heterosis isbigger the higher the genetic distance. H = 0 if δ = 0. His specific to each particular cross and must be determinedempirically, since we do not know the relevant loci nor theirallele frequencies.

12

Page 13: Population and Quantitative Genetics – HS18glebert/download/popGen/popGen_GlebEbert_v1.pdf · Population and Quantitative Genetics – HS18 v1.0 GlebEbert July2,2019 This document

11.6.1 Maximising and maintaining heterosis

To maximally exploit heterosis, F1 hybrids should be usedas the heterotic advantage decreases in F2 hybrids. Ter-minal crosses, which do not reproduce further, are oftenused in plant breeding. As this method is not practical foranimals, two other strategies can be used to balance thecost of breeding F1 hybrids and the decrease in performanceof F2 hybrids: The first is the use of synthetics, where nparental lines with superior combining ability are chosenand a random-mating population is formed by making alln(n−1)

2 pairwise intercrosses between the lines. The secondis the method of rotational crossbreeding. Here two,three or four (in theory there is no limit) different breedscan be used. Let’s take a three-breed rotation as an example:A female with a father from breed A is bred with a malefrom breed B. Their female progeny is bread with malesfrom breed C. The females of this generation are then breadwith males from breed A. In other words, each generationof females is bread with another breed than their femaleparent generation was.

13

Page 14: Population and Quantitative Genetics – HS18glebert/download/popGen/popGen_GlebEbert_v1.pdf · Population and Quantitative Genetics – HS18 v1.0 GlebEbert July2,2019 This document

12 Formulas useful in quantitative genetics

14