signatures of dobzhansky-muller incompatibilities in the ... · 14/12/2015  · classical dm model...

Post on 26-Aug-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GENETICS | INVESTIGATION

Signatures of Dobzhansky-Muller Incompatibilities inthe Genomes of Recombinant Inbred Lines

Maria Colomé-Tatché∗,1 and Frank Johannes§,†, 1

∗European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, A. Deusinglaan 1, NL-9713 AVGroningen, The Netherlands, §Population epigenetics and epigenomics, Department of Plant Sciences, Technical University Munich, Liesel-Beckmann-Str. 2,

85354 Freising, Germany, †Institute for Advanced Study, Technical University Munich, Lichtenbergstr. 2a, 85748 Garching, Germany

ABSTRACT In the construction of Recombinant Inbred Lines (RILs) from two divergent inbred parents certain genotype(or epigenotype) combinations may be functionally “incompatible” when brought together in the genomes of the progeny,thus resulting in sterility or lower fertility. Natural selection against these epistatic combinations during inbreeding canchange haplotype frequencies and distort linkage disequilibrium (LD) relations between loci on the same or on differentchromosomes. These LD distortions have received increased experimental attention, because they point to genomicregions that may drive Dobzhansky-Muller-type of reproductive isolation and, ultimately, speciation in the wild. Herewe study the selection signatures of two-locus epistatic incompatibility models and quantify their impact on the geneticcomposition of the genomes of 2-way RILs obtained by selfing. We also consider the biases introduced by breeders whentrying to counteract the loss of lines by selectively propagating only viable seeds. Building on our theoretical results, wedevelop model based maximum likelihood (ML) tests which can be applied to multi-locus RIL genotype data to infer theprecise mode of incompatibility as well as the relative fitness of incompatible loci. We illustrate this ML approach in thecontext of two published A.thaliana RIL panels. Our work lays the theoretical foundation for studying more complexsystems such as RILs obtained by sibling mating and/or from multi-parental crosses.

KEYWORDS Genetic incompatibility; RIL; long-range LD; selection; epistasis; recombination; complex traits; Dobzhansky-Muller; inbreeding

Hybrids from crosses between two divergent parental linessometimes display low fertility and phenotypic abnormal-

ities (Presgraves 2010). These effects are often attributable tocombinations of parental gentoypes (or epigenotypes) at twoor more loci that are functionally incompatible when broughttogether into a single genome. This form of negative epistasiswas originally invoked by Dobzhansky (Dobzhansky 1937) andMuller (Muller 1942) as a model for speciation. In the classicalDobzhansky-Muller (DM) model, a population splits into twosub-populations which become reproductively isolated through

Copyright © 2015 by the Genetics Society of Americadoi: 10.1534/genetics.XXX.XXXXXXManuscript compiled: Wednesday 16th December, 2015%1Corresponding authors: European Research Institute for the Biology of Ageing,University of Groningen, University Medical Center Groningen, A. Deusinglaan 1,NL-9713 AV Groningen, The Netherlands, E-mail: m.colome.tatche@umcg.nl andPopulation epigenetics and epigenomics, Department of Plant Sciences, TechnicalUniversity Munich, Liesel-Beckmann-Str. 2, 85354 Freising, Germany and Institute forAdvanced Study, Technical University Munich, Lichtenbergstr. 2a, 85748 Garching,Germany, E-mail: f.johannes@tum.de

geographic or temporal mechanisms (i.e. pre-zygotically). Onceseparated, the two sub-populations acquire independent muta-tions that are incompatible upon hybridization, thus resultingin sterility or reduced fertility among offspring. This processprevents further mixing and reinforces the existing (pre-zygotic)reproductive isolation genetically (i.e. post-zygotically). Ad-ditional independent mutations accumulate over time, caus-ing further divergence between sub-populations and ultimatelyspeciation. Empirical examples of inter-specific genetic incom-patibilities are well documented in the literature (Presgraves2010) and have motivated extensive theoretical work in evolu-tionary genetics (e.g. (Nei et al. 1983; Orr and Orr 1996; Turelliand Orr 2000; Orr and Turelli 2001; Turelli et al. 2001; Barton2001; Welch 2004; Fierst and Hansen 2010; Bank et al. 2012)).Interestingly, genetic incompatibilities with varying degrees ofpenetrance are often already visible in intra-specific experimen-tal crosses of inbred laboratory strains (Corbett-Detig et al. 2013;Chae et al. 2014). The detection and functional analysis of suchintra-specific incompatibilities could provide fundamental in-

Genetics, Vol. XXX, XXXX–XXXX December 2015 1

Genetics: Early Online, published on December 17, 2015 as 10.1534/genetics.115.179473

Copyright 2015.

sights into the mechanisms that drive post-zygotic reproductiveisolation in the wild, and thus represent a useful model for un-derstanding the molecular basis of speciation (Bomblies andWeigel 2007).

In plants, the clearest examples of intra-specific genetic in-compatibilities come from experimental crosses of A. thaliana(e.g. (Bomblies et al. 2007; Bikard et al. 2009; Durand et al. 2012;Chae et al. 2014)). Arguably the best studied case is the work ofBikard et al. (Bikard et al. 2009), who examined F2 progeny ofselfed hybrids derived from the Columbia (Col) and the CapeVerte (Cvi) accessions. The authors found that a subset of the F2shad severely compromised fitness, and demonstrated that thisfitness loss is caused by a genetic incompatibility involving areciprocal loss of duplicate genes on chromosome (chr) 1 and chr5 (Fig. 1A). Specifically, it was shown that Cvi carries a deletionof the gene on chr 5 and Col a non-functional version on chr 1,both of which act recessively. Hence, F2 individuals with the re-cessive epistatic combination Col|Col (chr 1) and Cvi|Cvi (chr 5)are (nearly) embryonic lethal. Interestingly, the genomic regionsthat are implicated in this epistatic incompatibility were firstidentified in a densely genotyped population of RecombinantInbred Lines (RIL) derived from the Col and Cvi accessions: atgeneration F6 of inbreeding, the authors noted strong long-rangelinkage disequilibrium (LD) between markers on chr 1 and 5(Fig. S1A) (Simon et al. 2008). Combinations of the Col|Colmarker genotype on chr 1 and the Cvi|Cvi marker genotype onchr 5 were completely absent, suggesting that these epistaticcombinations were subject to intense selection during inbreed-ing. Similar long-range LD patterns were identified in anotherRIL population originating from Sha and Col accessions, andinvolved epistatic interactions between a locus on chr 4 and chr5 (Fig. S1B). The two loci were subsequently fine-mapped, andfunctional studies revealed that this epistatic incompatibilityis due to stable epigenetic silencing of a paralogue (Fig. 1B)(Durand et al. 2012). This latter finding illustrates that - besidegenetic factors - also epigenetic factors can cause intra-specificincompatibilities in plants, although it remains to be seen howcommon such epigenetic phenomena are.

Short- or long-range LD distortions between loci on thesame or on different chromosomes is a common feature of RILsgenomes. From the standpoint of complex trait analysis, suchdistortions are typically undesirable because they affect the res-olution and power of quantitative trait locus (QTL) mapping.On the other hand, a systematic analysis of LD distortion pat-terns can provide insights into the epistatic architecture underly-ing genetic incompatibilities and yield targets for experimentalfollow-up. A decisive contribution to such efforts is a theoreticalanalysis of different incompatibility models and their selectionsignatures in the genomes of RILs. Most of the theoretical workdevoted to understanding the genomes of RILs has ignored therole of selection (e.g (Haldane and Wadington 1931; Broman2005; Martin 2006; Teuscher and Broman 2007; Johannes andColomé-Tatché 2011; Martin and Hospital 2011; Broman 2012;Zheng et al. 2015)). The exception is the early work by Haldane(Haldane 1956), Reeve (Reeve 1955) and Hyman and Mather(Hayman and Mather 1953), who examined cases of selectionagainst homozygotes at a single locus, and described the changesin genotype frequencies as a function of inbreeding and selec-tion. However, these earlier theoretical results are of limited usefor understanding the selection signatures of DM-type geneticincompatibilities as the latter require multi-locus models.

Here we provide the first theoretical analysis of two-locus in-

Figure 1 (A) Example genetic incompatibility in a cross be-tween A. thaliana accessions Col and Cvi. Locus At1g71920on chr 1 is expressed in Cvi but not in Col, while homologouslocus At5g10330 on chr 5 is expressed in Col but deleted inCvi. (B) Example genetic incompatibility in a cross betweenA. thaliana accessions Col and Sha. Locus AtFOLT2 on chr 4is expressed in Sha but deleted in Col, while homologous lo-cus AtFOLT1 on chr 5 is expressed in Col but epigeneticallysilenced in Sha through DNA methylation (black triangles).

compatibility models in the context of RIL construction. We con-sider three variants of the classical DM-model (the dominanceepistasis, the recessive epistasis, and the dominance-recessiveepistasis model) and quantify their respective effects on short-and long-range LD patterns as a function of inbreeding, fitnessand recombination. We also give theoretical expressions forthe total number of lines that are expected to be lost under dif-ferent incompatibility scenarios. Building on these results, wepresent model based maximum likelihood (ML) tests which canbe used for the detection of incompatible loci from multi-locusgenotype data collected at any inbreeding generation. We applythis ML method to two published A. thaliana RIL panels. Ourwork lays the theoretical foundation for studying more complexsystems such as RILs obtained by sibling mating and/or frommulti-parental crosses.

Overview of genetic incompatibility models

The simplest form of epistatic incompatibility involves the in-teraction between only two loci, say L1 and L2. Consider twodivergent inbred lines with diplotypes (i.e. two-point genotypes)AA|AA and BB|BB, where the “dot” superscript denotes a non-functional (i.e. mutant) allele. We use the notation IK|JL todistinguish genotypes I|J and K|L at the first (L1) and second(L2) locus, respectively, from haplotypes IK and JL on each ofthe two homologous chromosomes (Table 1). Hence, inbred lineAA|AA is homozygous for two mutant alleles at the first locusand homozygous wild type at the second locus, while inbredline BB|BB is homozygous mutant at the second locus and ho-mozygous wild-type at the first locus. There are three basicmodels of two-locus epistatic incompatibility, the dominanceepistasis model (M1), the recessive epistasis model (M2), andthe dominance-recessive epistasis model (M3). These modelsare summarized in (Table 2) and are further detailed below.

Dominance epistasis model (M1): In the classical DM-

2 Maria Colomé-Tatché and Frank Johannes et al.

model, individuals with diplotypes AA|AA and BB|BB are fullyviable, but their F1 hybrid progeny AA|BB is sterile or showsreduced fertility. The reduced fitness of the hybrid is the result ofdominance interactions of loci L1 and L2, meaning that allele Ais dominant over B at L1, while allele B is dominant over A at L2.When the loss of fertility is not fully penetrant, F1 hybrids canbe crossed (or selfed) to obtain a F2 population. Due to recombi-nation and/or independent segregation of alleles at loci L1 andL2, there are 16 possible diplotypes in the F2 (Table 1). One canassume that double heterozygote F2 individuals (AA|BB) expe-rience the same loss of fitness as in the F1. However, due to thedominance interactions, there are additional diplotypes in the F2or in subsequent generations that are phenotypically equivalentto AA|BB, and will therefore be subject to the same, or similar,fitness loss. These diplotypes, with their corresponding fitnessparameters wj, are summarized in Table 2.

Recessive epistasis model (M2): A basic requirement of theclassical DM model is that the incompatibility first appears inF1 hybrids. This may not always be the case. A less stringentversion of the DM model is the recessive epistasis model. In thismodel allele A is recessive to B at the first locus and allele B isrecessive to A at the second locus. This leads to selection againstAB|AB individuals, which do not appear in the F1 populationbut only at later breeding generations at low frequency (Table2).

Dominance-recessive epistasis model (M3): A combina-tion of the dominance and the recessive epistasis model is thedominance-recessive epistasis model. In this model, allele A isdominant over B at the first locus and B is recessive to A at thesecond locus. Selection is against individuals with diplotypeAB|BB, BB|AB and AB|AB (Tables 1 and 2). Similar to the reces-sive epistasis case, this model implies that incompatibility doesnot appear in F1 individuals but only at later breeding genera-tions. The reciprocal model where allele A is recessive to B atthe first locus and B is dominant over A at the second locus isequivalent and can be obtained by considering the symmetriesA↔ B and L1 ↔ L2.

Of course, the above three incompatibility models are justas valid had we assumed that the two inbred lines are insteadAA|AA and BB|BB, meaning that the mutant allele A is at thesecond locus and mutant allele B at the first locus. Various de-grees of partial-dominance are taken into account by attributingdifferent fitness parameters to deleterious diplotype (Table 2). Inthe following section we develop the necessary analytical frame-work to quantify the population-level consequences of thesethree incompatibility models during RIL construction. Readerswho are primarily interested in the biological insights may skipdirectly to the Results Section.

Theory

Markov chain modelConsider the construction of a 2-way RIL by selfing startingfrom a F2 base-population. There are 16 possible diplotypes inthe F2. Ignoring haplotype order, these can be grouped into 10diplotype classes (Table 1). Individuals from the F2 (time t = 1)are chosen to initiate an inbreeding process by repeated selfingfor many generations to obtain a final population of RILs. The in-breeding process can be modeled as an absorbing finite Markovchain, where the states of the chain are the different diplotypes{d1, . . . , d10} (Table 1). Assume that χt denotes the diplotypestate of an individual at generation t. Then {χt} forms a Markovchain, i.e., χt+1 is independent of χ0, χ1, . . . , χt−1 given χt. We

Table 1 List of the 16 diplotypes arising from the AA|AA ×BB|BB cross, where A and B denote wild-type alleles and Aand B denote non-functional (mutant) alleles. Ignoring hap-lotype order the 16 diplotypes can be grouped into only 10different classes

Diplotype class Prototype Equivalences

d1 AA|AA

d2 BB|BB

d3 AB|AB

d4 BA|BA

d5 AA|AB AA|AB, AB|AA

d6 AA|BA AA|BA, BA|AA

d7 BB|BA BB|BA, BA|BB

d8 BB|AB BB|AB, AB|BB

d9 AA|BB AA|BB, BB|AA

d10 AB|BA AB|BA, BA|AB

define the transition probability Tij = Pr(χt+1 = dj|χt = di)as a function of both r and {wj}, where r (0 ≤ r ≤ 0.5) is therecombination rate at meiosis, and wj (0 ≤ wj < 1) is the fitnesscorresponding to diplotype j. The transition matrix T is the col-lection of transition probabilities from one diplotype to anotherin one generation of inbreeding. For notational simplicity wewill omit the “dot” superscript in the following, and implicitlykeep track of the origin of the non-functional alleles. The generalform of T is shown in Appendix A. Following Reeve (Reeve1955), we augment the Markov chain with a pseudo-state “lost”,which accounts for the loss of diplotypes as a result of differ-ential survival. The column corresponding to the “lost” statein the new transition matrix T∗ is given by T∗i,11 = 1−∑10

j=1 Tij

for each line i = (1, · · · , 10), and by T∗11,11 = 1 for line 11. Thisaddition ensures that the rows of the new transition matrix T∗

sum to unity. The initial 1× 11 row vector of state probabilitiesis

π∗0 = (0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0), (1)

which corresponds to the hybrid diplotype AA|BB at F1 (timet = 0) where there are no other diplotypes and no selectionunless w9 = 0. Hence, the state probabilities at any generation tof inbreeding can be obtained from the general formula

π∗t = π∗0 (T∗)t (2)

whereπ∗t =

(π∗d1

(t), . . . , π∗d10(t), π∗lost(t)

). (3)

Note that the elements of π∗t are functions of r and the fitness wj.Since only the surviving lines are of interest, one may drop the“lost” state and work instead with the reduced 10× 10 submatrixof survivors, T, and the reduced 1× 10 state vector πt (Reeve1955). This leads to the recursion

πt = π0 Tt = π0 PVtP−1, (4)

where P is the eigenvector of T and V is a diagonal matrix ofthe distinct eigenvalues of T. We obtain the relative diplotypeproportions of surviving lines by normalizing the diplotypeproportions at any generation t of inbreeding by the mean fitness

DM incompatibilities in the Genomes of RILs 3

Table 2 Overview of the three incompatibility models M1, M2 and M3 and the model without selection M0. Shown are the fitnessparameters assigned to each diplotype j (Table 1) with w, w′ < 1.

Model Name Diplotype (j) fitness Description

w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

M0 No selection 1 1 1 1 1 1 1 1 1 1 Reference model without selection

M1 Dominance epistasis 1 1 w 1 w′ 1 1 w′ w′ w′ A is dominant at the first locus and

B is dominant at the second locus

M2 Recessive epistasis 1 1 w 1 1 1 1 1 1 1 A is recessive at the first locus and

B is recessive at the second locus

M3 Dominance-Recessive 1 1 w 1 1 1 1 w′ 1 1 A is dominant at the first locus and

epistasis B is recessive at the second locus

in the population at time t, w(t) = ∑10j=1 πdj

(t). Let us define thenormalized diplotype frequencies by

πt = πt/w(t).

Using equation 4 we derive analytical expressions for the diplo-type probabilities at any inbreeding generation (Wolfram Re-search 2015). For models M1, M2, M3 and for the case with-out selection (model M0), we list the non-normalized diplotypeprobabilities at F∞ in Appendix B and D, and those for inter-mediate generations in Appendix C and E. The expected pro-portion of lost lines (lost) can be easily calculated from thesenon-normalized diplotype probabilities using

lost = 1−10

∑j=1

πj(t, r, wj),

which shows that the proportion of lost lines depends on theinbreeding generation t, the fitness wj and the meiotic recombi-nation rate r between the two incompatible loci.

Breeder BiasIn practical situations, the breeder would want to keep as manylines as possible, and therefore tries to counteract the loss oflines by implementing what may be called “biased single seeddescent” (BSSD) (Appendix F). That is, rather than selectingonly one seed at random to propagate a given line to the nextgeneration, the breeder plants many seeds from one line andchooses one that appears viable (Appendix F). This is equivalentto arguing that the breeder will not propagate a lost line. Thiscorrection process can be modeled by normalizing each rowelement (ij) of T by the row total:

TBSSDij = Tij/

10

∑j=1

Tij,

which has the effect that no lines are actually lost at intermedi-ate generations or at F∞. The only exception is when there iscomplete lethality (i.e. wj = 0). In this case, lines that have be-come fixed for a given incompatible homozygous diplotype willnot produce any viable seed at all, thus leaving no alternativeseeds to choose from. Although it is possible to find closed formsolutions for these re-normalized diplotype probabilities, theseexpressions have no easy form and are therefore omitted.

Time-dependent Linkage disequilibrium (LD)Changes in diplotype frequencies alter haplotype proportionsin the population. As we will see, all incompatibility modelsresult in a relative gain in non-recombinant diplotypes; or statedalternatively, in a loss of diplotypes carrying recombinant hap-lotypes. These haplotype distortions lead to increased linkagedisequilibrium (LD) within chromosomes (i.e. short-range LD)and also between chromosomes (i.e. long-range LD). To calcu-late LD between loci L1 and L2 we first obtain the haplotypeprobabilities for any time t as follows:

hk(t, r, wj) = πk|k(t, r, wj) + ∑12

πk|−(t, r, wj),

where k denotes the haplotype (i.e. k ∈ {AA, BB, AB, BA}) and− is any haplotype but k (e.g. hAB(t, r, wj) = πAB|AB(t, r, wj) +

∑ 1/2 πAB|−(t, r, wj), with − = {AA, BB, BA}). Explicit ana-lytical expression for these haplotype probabilities for modelsM1, M2, M3 at generation F∞ can be found in Appendix D, andthose for intermediate generations in Appendix E. As a referencewe also provide the results for the case without selection (M0)in Appendix B and C (at generations F∞ and at intermediategenerations, respectively). For the case of breeder bias analyticalsolutions are possible but have no easy form and are thereforeomitted. Using these haplotype probabilities, we define the ran-dom variables y1k and y2k which take values 1 or −1 accordingto whether locus 1 or 2 on haplotype k, respectively, carry allelesA or B. A time-dependent measure of LD can be obtained bycalculating:

LD(t, r, wj) = ∑k

y1ky2k hk(t, r, wj)− µ1µ2√σ2

1 + σ22

, (5)

where k ∈ {AA, BB, AB, BA} and µ1, µ2 and σ21 , σ2

2 are themeans and variances of y1 and y2, respectively.

Maximum Likelihood estimationThe analytical expressions for the diplotype probabilities (Ap-pendix E) can be employed in a Maximum Likelihood procedurefor the analysis of multi-locus RIL genotype data at any gen-eration of inbreeding. This procedure provides a method forestimating the most likely incompatibility model to have gener-ated the data as well as the fitness coefficients corresponding tothe different diplotypes.

4 Maria Colomé-Tatché and Frank Johannes et al.

Consider a sample of N RILs collected at any inbreeding gen-eration t, with one random sibling representing each line. LetYj (j = 1, . . . , 10) be a random variable denoting the number oflines with diplotype dj (or its equivalent class) at loci L1 andL2. Since the lines are independent, the probability mass func-tion of the observations y1, . . . , y10 is given by a multinomialdistribution:

Pr(Y1 = y1, . . . , Y10 = y10) =N!

y1! . . . y10!

10

∏j=1

πdj(t, r, wj)

yj , (6)

where y1 + · · ·+ y10 = N. Ignoring constant terms, we write thelog-likelihood function (`′) for a given incompatibility model Miand a fixed recombination fraction r as:

`′(θ′|y1, . . . , y10, t, r, Mi) =10

∑j=1

yj ln πdj(t, r, wj), (7)

where θ′ are the unknown fitness values. Maximization of (7)yields estimates of the fitness as well as the likelihood valueof a given incompatibility model. Competing incompatibilitymodels can be compared using standard model comparisoncriteria. We note that inferences regarding incompatible loci onthe same chromosome are difficult, because the parameters r andwj are partly confounded in the likelihood equations (i.e. r andwj often multiply each other, see Appendix). This is particularlyproblematic when L1 and L2 are in tight linkage and selection isweak (see Table S3).

Results

The following section highlights several important biologicalinsights that may be of practical relevance for experimentalistsworking on genetic incompatibility or with populations of RILsin general. Throughout we will present results for generationF8 (as this is a typical reference generation in the constructionof RILs by selfing) and generation F∞ (as this is the theoreticallimit), and for the three incompatibility models (M1, M2, M3)and the model without selection (M0). Results for any other in-breeding generation can be directly extracted from the analyticalformulas presented in the Theory Section and the Appendix. Tosimplify discussion we will consider the special case w′ = w(i.e. no partial-dominance). We note that the value w = 1 is adiscontinuity point in all models, as for w = 1 they all reduce toM0 (Table 2). For visual purposes this discontinuity point is notshown in the result Figures 2 and 3.

Genetic incompatibility leads to a loss of lines during inbreed-ingThe most obvious consequence of genetic incompatibilities isthat selection against certain diplotypes leads to the eventualloss of lines during inbreeding. The magnitude and rate of thisloss depends on the mode of incompatibility (i.e. models M1,M2 and M3), the meiotic recombination rate (r) as well as thefitness w. To illustrate this, we plot the expected proportion oflost lines for two different values of r (0.05, 0.5) against w atgeneration F8 and F∞ (Fig. 2A).

The loss of lines is most severe for the dominance-epistasismodel (M1). This is because the number of different diplotypesthat are selected against is largest under this model (Table 2).As the fitness of the incompatible diplotype approaches zero(w → 0) more than 50% of the lines are expected to be lost bygeneration F∞, and this percentage is not much influenced by r.

It is perhaps not surprising that the recessive-epistasis model(M2) is the most benign, with the loss of lines never exceeding25% as selection acts exclusively against the genetically fixedrecombinant diplotype AB|AB. Hence, the loss of lines at gener-ation F∞ depends only on r but not w. With larger r more linesacquire recombinant haplotype AB during inbreeding and thishaplotype can go on to fixation. By generation F∞ all AB|ABlines will have been purged from the population. Hence, givensufficient time this processes does not depend on the selectionintensity, but does require that w < 1.

For low fitness (w < 0.5) selection is generally quite efficientsuch that the proportion of lost lines converges rapidly to itslimiting value at F∞. However, for w & 0.5 the proportion of lostlines at generation F8 differs from what is expected at generationF∞, and this feature is common to all models. Another commonfeature of all three models is that the loss of lines is positivelyrelated to the recombination rate between the two incompatibleloci. This is because selection acts primarily against recombinantdiplotypes in all models (Table 2), so that the loss of lines isexpected to be most severe when the incompatibility is due tounlinked loci.

Differential survival of lines during inbreeding has other, lessobvious, population-level consequences: It affects genotype andhaplotype frequencies, which in turn can distort LD patternsin the genomes of RILs. We will discuss these effects in thesubsequent sections.

Genetic incompatibility changes genotype frequencies be-yond fixation

In the construction of RILs by selfing inbreeding is usually nottaken further than generation F8 as the lines are considerednearly fixed at that point. Indeed, in the absence of selection (M0)the F8 diplotype frequencies are close to their theoretical limit(F∞), with only about 3% of the lines still awaiting fixation. Thissituation is drastically different when genetic incompatibilitiesare present in the form of models M1, M2 and M3. In this case,certain diplotypes, many of which are already genetically fixedsuch as AB|AB, are under persistent selection and thus continueto change the relative genotype frequencies among RILs, even atvery advanced inbreeding generations. To illustrate this we plotthe difference between the diplotpye proportions at F∞ and atF8 (i.e. ∑i

(|πdi

(F8)− πdi(F∞)|

)) (Fig. 2B). For w & 0.75 all three

incompatibility models show a higher frequency of changingdiplotypes compared to the case without selection. One majorreason for this is that selection against specific recombinantdiplotypes (e.g. AB|AB) persists for much longer than the time ittakes to generate them through recombination and fixation. Thiseffect is clearest when the two incompatible loci are unlinked(r = 0.5).

With decreasing fitness the three incompatibility models be-gin to differ in subtle ways: for w . 0.5, the frequency of chang-ing diplotypes after generation F8 is actually smaller for modelM1 than it is for the case without selection (M0), for examplefor w = 0.3 and r = 0.05, the frequency of changing diplotypesafter generation F8 in M1 is 0.35%, compared to 3.1% in M0.This can be attributed to the fact that many genetically un-fixeddiplotypes (e.g. AB|AA) are purged at a faster rate than the rateat which they become fixed. A similar, albeit less drastic, situa-tion occurs in model M3 but requires much stronger selectionpressures and is dependent on r (Fig. 2B). By contrast, in modelM2 the frequency of changing diplotypes never drops below tothat without selection. This is due to selection being restricted

DM incompatibilities in the Genomes of RILs 5

Figure 2 Single seed descent(SSD) results: (A) Proportionof lost lines versus fitness of theincompatible diplotypes in theSSD model. For w = 1 the pro-portion of lost lines is 0, whilefor w = 0 in M1 the proportionof lost lines is 1 (discontinuitypoints not shown on the plot).(B) Proportion of diplotypes thathave not yet reached fixation forthe SSD model. For w = 1 theproportion of non-fixed lines inM1, M2 and M3 is the same asin M0 (discontinuity points notshown on the plot). (C) Linkagedisequilibrium versus fitness forthe SSD model. For w = 1 thelinkage disequilibrium in M1,M2 and M3 is the same as in M0(discontinuity points not shownon the plot).

to the recombinant diplotype AB|AB, so that fixation for thisdiplotype needs to occur first before it can get purged from thepopulation.

Taken together, these results raise important practical con-siderations: They imply that RILs that segregate incompatiblegenotypes cannot be viewed as an ‘eternal’ genetic resource,as their genotype frequencies continue to change upon furtherpropagation, particularly under weak selection. With plantsthis can be partly bypassed by stocking seeds from a referencegeneration which is then distributed to the community for phe-notyping experiments. However, with RILs derived by siblingmating, lines can only be maintained by continued crossing. Ex-perimental results obtained with genetic material from differentinbreeding generations may therefore not be comparable.

Genetic incompatibility increases LD within and across chro-mosomes

It is intuitively obvious that selection against certain diplo-types during inbreeding indirectly affects haplotype frequencies.Changes in the relative frequency of recombinant haplotypes dis-tort LD relations between loci within or across RIL chromosomes.To visualize this, we plot LD against w for different values ofthe meiotic recombination rate, r (0.05 and 0.5) (Fig. 2C). Prob-ably the most important observation is the strong induction oflong-range LD between genetically unlinked loci (r = 0.5) for allincompatibility models. Indeed, at generation F∞ the genotypesat the two incompatible loci are expected to be correlated in theorder of 0.5, whereas they are expected to be uncorrelated in theabsence of selection. For w < 0.5, all models show that long-range LD rapidly reaches its maximum with generation time:LD is already near its limiting value at generation F8. However,

for w & 0.5 long-range LD continues to increase beyond gen-eration F8 as the relative frequency of recombinant haplotypesslowly decreases as a result of differential survival of lines.

LD within chromosomes is of course already high due togametic linkage, and scales with the genetic distance betweenthe two incompatible loci. In this case, selection will reinforceLD even further, leading to (local) genetic map contractions inthe genomes of RILs.

One counter-intuitive observation in the LD patterns for mod-els M1 and M3 is the slight increase in LD at generation F∞ as afunction of fitness. To understand this it is necessary to discussthe fate of haplotypes during inbreeding under these two mod-els. In both cases the proportion of recombinants depends onthe fitness w, and both models show that low fitness values willlead to a higher proportion of recombinant diplotypes comparedto higher fitness values (Appendix F). However, recall that selec-tion in both incompatibility models is against several diplotypes(Table 2), many of which carry the non-recombinant parentalhaplotypes AA or BB. Hence, with strong selection (low fitness)more lines are lost, but among the survivors there is an over-representation of diplotypes carrying recombinant haplotypes.By contrast, with lower selection (higher fitness) there are moresurviving lines, but among these there is a higher proportion ofparental non-recombinant haplotypes.

Preventing the loss of lines introduces additional biases

It seems sensible that many of the adverse affects of genetic in-compatibility could be bypassed by preventing the loss of linesin the first place. However, preventing the loss of lines throughcounter-selection (BSSD, Appendix F) does not imply that thediplotype frequencies are also corrected as if no selection had

6 Maria Colomé-Tatché and Frank Johannes et al.

occurred. Selection against incompatible diplotypes persists, butthe breeder chooses to propagate a compatible individual in-stead of losing a line by trying to propagate an incompatible one(Appendix F). In this way the breeder introduces unexpectedbiases into the inbreeding dynamics, particularly with regardsto haplotype frequencies and LD patterns. This is clearly illus-trated in Figure 3, where we plot the proportion of recombinanthaplotypes among surviving lines. In general, we find that BSSDleads to higher proportion of recombinant haplotypes than inthe case of standard single seed descend (SSD). However, theseproportions are nowhere close to what would be expected inthe absence of genetic incompatibility. The most unexpectedobservation is that for unliked loci, when w . 0.2, BSSD canactually produce a lower proportion of recombinant individu-als among surviving lines. This means that even though morelines have been salvaged, the proportion of recombinant hap-lotypes in the final RILs is even lower than among survivinglines without breeder bias. The trade-off between the numberof surviving lines and the proportion of recombinant individu-als is important for complex trait mapping analysis where notonly the sample size but also the proportion of recombinantsare key determinants of mapping resolution. Making informeddecisions regarding the use of BSSD is difficult, as the presenceand/or severity of genetic incompatibilities is usually unknownprior to RIL construction. Be it as it may, the important obser-vation about BSSD is that it will lead to another set of biases inthe genomes of RILs (Fig. 3, Fig. S2 and Appendix F). Breedersshould be aware of these biases.

Application to RIL genotype dataSimon et al. (2008) presented genetic maps of two A. thalianaRIL populations derived from crosses between Cvi×Col andSha×Col accessions. Their analysis of the genotype data re-vealed several cases of long-range LD between pairs of mark-ers on different chromosomes (Fig. S1). In the Cvi×Col cross,long-range LD was detected between markers on chr 1 and 5and between markers on chr 1 and 3. In the Sha×Col cross,long-range LD was detected between markers on chr 4 and 5.The authors suggested that these LD patterns are the resultsof intense epistatic selection against certain parental genotypecombinations during inbreeding. We reanalyzed the genotypedata from both RIL crosses using our ML approach (Eq. 7).We focused on the two significant LD patterns originally de-scribed by Simon et al (Simon et al. 2008) and for which laterexperimental follow-up work established the precise mode andmolecular basis of the incompatibilities (Fig. 1). In each case,we performed ML estimation using our three incompatibilitymodels (M1, M2, M3) with and without breeder bias and, whenappropriate, considered the symmetries A ↔ B, L1 ↔ L2 and(AA|AA, BB|BB)↔ (AA|AA, BB|BB) (Table S1 and S2 and Ap-pendix F). Our goal is to infer the most likely incompatibilitymodel to have generated the observed genotype data, and toobtain estimates of the fitness values.

Cvi × Col cross: In their analysis of the Cvi×Col cross, Simonet al. (2008) noted that individual RILs that carried the Col|Colgenotype at a marker on chr 1 were much less likely to carrythe Cvi|Cvi genotype at a marker on chr 5, although these lociwere physically unlinked. In an impressive follow-up study(Bikard et al. 2009) it was later demonstrated that the chr 1 andchr 5 incompatibility was due to a reciprocal loss of a duplicatedgene (Fig. 1A). Specifically, it was shown that Cvi carried adeletion of the gene on chr 5 and Col a non-functional version of

it on chr 1, both of which acted recessively. F2 individuals withthe recessive epistatic combination Col|Col (chr 1) and Cvi|Cvi(chr 5) were found to be (nearly) embryonic lethal. Consistentwith their follow-up results in the F2s, application of our MLapproach to the original RIL genotype data correctly identifiedthe recessive epistatic incompatibility model (Model M2) as themost likely, with non-functional alleles at chr 1 for Col and atchr 5 for Cvi (Table S1). In addition, we estimated that epistaticselection against the double recessive during inbreeding wassubstantial (fitness w = 0.323) (Table S1), which is in line withthe (near) embryonic lethality observed among the F2s.

Sha × Col cross: The genetic incompatibility in the Sha×Colcross is more complex: Simon et al. (Simon et al. 2008) observedthat the combination Col|Col in chr 4 and Sha|Sha on chr 5was nearly absent in the RILs. Molecular analysis of the twointeracting genomic regions (Durand et al. 2012) revealed thatSha carries a duplicated gene on chr 4, which epigenetically si-lences its original copy on chr 5 in trans. Silencing is most likelyachieved via the generation of small interfering RNA (siRNA)that promote methylation at homologous sequences. Addingto this complexity, the authors showed that Sha has an activecopy of the gene in chr 4, where no homologous gene exists forCol, while Col has an active copy of the gene in chr 5, where thiscopy is epigenetically silenced in Sha. Application of our MLapproach to the genotype data revealed that the chr 4 and 5 in-compatibility is most consistent with a partial dominance model(Model M3), with strong selection against individuals with geno-types Col|Col on chr 4 and Sha|Sha on chr 5 (w = 0.124), andweak selection against individuals with genotypes Col|Sha onchr 4 and Sha|Sha on chr 5 (w′ = 0.738) (Table S2). These ratherlow fitness values underline the authors’ observation that in-compatible individuals experienced a reduction in seed yield ofabout 80%− 90%. Interestingly, our ML analysis also detectedevidence for breeder bias in these data. This finding is consistentwith the authors having made concerted efforts to counter-actthe loss of lines during RIL construction (Durand et al. 2012). In-deed, we estimate that without counter-selection, approximately30% of the lines would have been lost. However, these conclu-sions should be interepreted with caution as our simulationsshow that reliable detection of breeder bias in RIL data requiresmuch larger sample sizes than in the populations considered byDurand et al. (Durand et al. 2012) (see Table S3).

The predominance of recessive or (partial) dominance epista-sis as a source of genetic incompatibilities in the Cvi×Col andthe Sha×Col cross makes sense considering that other incom-patibility effects such as those associated with full dominanceepistasis would have led to an initial loss of F1 individuals,which may have prevented the construction of these RILs in thefirst place. We therefore suspect that the detection of long-rangeLD in multi-locus RIL genotype data will most often be traceableto recessive or partial dominance epistasis, or else to dominanceepistasis in combination with weak selection.

Discussion

Recombinant Inbred Lines (RILs) are a popular tool for studyingthe genetic basis of complex traits. Many populations of RILshave been created in variety of organisms. The genotypic prop-erties of RILs often diverge drastically from what is expectedfrom theory. Widespread allele frequency distortions and un-expected long-range LD patters are common. Such features areoften the result of differential survival (or fertility) of certain

DM incompatibilities in the Genomes of RILs 7

Figure 3 Recombinant haplo-types: Proportion of recombi-nant haplotypes versus fitness ofthe incompatible diplotypes atgeneration F∞ (A) and at gener-ation F8 (B), for the single seeddescent (SSD) and the biasedsingle seed descent (BSSD) mod-els. For w = 1 the proportion ofrecombinant haplotypes in M1,M2 and M3 is the same as in M0,while for w = 0 in M1 with SSDthe proportion of recombinanthaplotypes is 0 (discontinuitypoints not shown on the plot).

combinations of parental genotypes during inbreeding. This isperhaps nowhere clearer than in the genomes of recently cre-ated 8-way RILs in the mouse, which were derived from eightdifferent inbred parental strains (Collaborative Cross Consor-tium 2012). The construction of these RILs has been severelyhampered by high lethality and infertility rates during inbreed-ing. Genotyping data at intermediate generations showed thatcertain parental genotypes were nearly absent in some genomicregions, and surviving lines displayed complex long-range LDpatterns. These observations are consistent with selection havingacted on entire networks of interacting loci. High-dimensionalincompatibility networks can be viewed as multi-locus exten-sions of the classical DM model. While interesting from a dataanalysis standpoint, theoretical modeling of such multi-locussystems in the genomes of RILs is analytically not tractable,which makes this problem much less attractive from a theoreticalstandpoint. While the classical two-locus DM model representsa limiting case, it does give a plausible mechanistic descriptionof how genetic incompatibilities initially arise in diverging sub-populations. Theory as well as empirical evidence suggest that,once DM-type incompatibilities take hold, additional incompati-bilities accumulate exponentially (i.e. they “snowball”) (Orr andTurelli 2001; Matute et al. 2010; Moyle and Nakazato 2010). Thisexponential accumulation suggests that two-locus incompatibili-ties expand into multi-locus incompatibility networks over time,rather than accumulating independently in an additive manner.

In the present work we studied the selection signatures ofdifferent variants of the classical DM-model in genomes of RILsobtained by selfing. Our analysis showed that DM-type incom-patibilities can give rise to complex inbreeding dynamics. In ourview, the most troublesome situation is the presence of weakselection as it will continue to change genotype frequencies andLD patterns among RILs, even beyond genetic fixation. Hence,RILs that segregate incompatible genotypes do not, technically,present a reference population for the community, and pheno-typic results may not be comparable across studies. Our analysisalso showed that counter selection by breeders cannot rescuethe adverse affects of genetic incompatibility but introduce addi-tional biases in the form of LD and haplotype distortions. Whilethese issues can be concerning to breeders whose aim it is tocreate these populations for downstream complex trait analysis,

many existing RIL genotype datasets may present a largely un-explored resource for studying the basic principles underlyinggenetic incompatibilities. However, it is important to keep inmind that incompatibilities detected in RILs are not necessarilyrepresentative of natural populations, as the interacting allelesmay be, individually or jointly, so rare that incompatible hybridsarise at only very low frequencies. Nonetheless, we argue that adeeper understanding of the mechanisms that cause genetic in-compatibilities in experimental crosses may help us to establishthe sufficient (molecular) conditions for speciation in the wild.

Acknowledgments

We thank M. Shojaei Arani for discussions and for his con-tribution on the preparation of the formulas. This work wassupported by grants from the Netherlands Organization forScientific Research (M.C.-T) and by a University of GroningenRosalind Franklin Fellowship (to M.C.-T). F.J. acknowledgessupport from the Technische Universität München - Institute forAdvanced Study, funded by the German Excellence Initiativeand the European Union Seventh Framework Programme undergrant agreement nb. 291763.

Literature Cited

Bank, C., R. Bürger, and J. Hermisson, 2012 The limits to para-patric speciation: Dobzhansky-Muller incompatibilities in acontinent-island model. Genetics 191: 845–63.

Barton, N. H., 2001 The role of hybridization in evolution. Molec-ular Ecology 10: 551–568.

Bikard, D., D. Patel, C. Le Metté, V. Giorgi, C. Camilleri, M. J.Bennett, and O. Loudet, 2009 Divergent evolution of duplicategenes leads to genetic incompatibilities within {A}. thaliana.Science 323: 623–626.

Bomblies, K., J. Lempe, P. Epple, N. Warthmann, C. Lanz, J. L.Dangl, and D. Weigel, 2007 Autoimmune response as a mech-anism for a Dobzhansky-Muller-type incompatibility syn-drome in plants. PLoS biology 5: e236.

Bomblies, K. and D. Weigel, 2007 Arabidopsis—a model genusfor speciation. Current Opinion in Genetics & Development17: 500–504.

8 Maria Colomé-Tatché and Frank Johannes et al.

Broman, K. W., 2005 The genomes of recombinant inbred lines.Genetics 169: 1133–46.

Broman, K. W., 2012 Genotype probabilities at intermediategenerations in the construction of recombinant inbred lines.Genetics 190: 403–12.

Chae, E., K. Bomblies, S.-T. Kim, D. Karelina, M. Zaidem, S. Os-sowski, C. Martín-Pizarro, R. Laitinen, B. Rowan, H. Tenen-boim, S. Lechner, M. Demar, A. Habring-Müller, C. Lanz,G. Rätsch, and D. Weigel, 2014 Species-wide Genetic Incom-patibility Analysis Identifies Immune Genes as Hot Spots ofDeleterious Epistasis. Cell 159: 1341–51.

Collaborative Cross Consortium, 2012 The genome architectureof the collaborative cross mouse genetic reference population.Genetics 190: 389–401.

Corbett-Detig, R. B., J. Zhou, A. G. Clark, D. L. Hartl, and J. F.Ayroles, 2013 Genetic incompatibilities are widespread withinspecies. Nature 504: 135–7.

Dobzhansky, T., 1937 Genetics and the Origin of Species. ColumbiaUniversity Press, New York.

Durand, S., N. Bouché, E. P. Strand, O. Loudet, and C. Camilleri,2012 Rapid establishment of genetic incompatibility throughnatural epigenetic variation. Current Biology 22: 1–6.

Fierst, J. L. and T. F. Hansen, 2010 Genetic architectureand postzygotic reproductive isolation: Evolution of bate-son–dobzhansky–muller incompatibilities in a polygenicmodel. Evolution 64: 675–693.

Haldane, J. and C. Wadington, 1931 Inbreeding and linkage.Genetics 16: 357–374.

Haldane, J. B. S., 1956 The conflict between inbreeding andselection I. Self-fertilization. Journal of Genetics 54: 56–63.

Hayman, B. I. and K. Mather, 1953 The progress of inbreedingwhen homozygotes are at a disadvantage. Heredity 7: 165–183.

Johannes, F. and M. Colomé-Tatché, 2011 Quantitative epigenet-ics through epigenomic perturbation of isogenic lines. Genet-ics 188: 215–27.

Martin, O. C., 2006 Two- and Three-Locus Tests for LinkageAnalysis Using Recombinant Inbred Lines. Genetics 173: 451–459.

Martin, O. C. and F. Hospital, 2011 Distribution of ParentalGenome Blocks in Recombinant Inbred Lines. Genetics 189:645–654.

Matute, D. R., I. A. Butler, D. A. Turissini, and J. A. Coyne, 2010A test of the snowball theory for the rate of evolution of hybridincompatibilities. Science 329: 1518–1521.

Moyle, L. C. and T. Nakazato, 2010 Hybrid incompatibility"snowballs" between {Solanum} species. Science 329: 1521–1523.

Muller, H. J., 1942 Isolating mechanisms, evolution and temper-ature. Biol. Symp 6: 71–125.

Nei, M., T. Maruyama, and C.-I. Wu, 1983 Models of {Evolution}of {Reproductive} {Isolation}. Genetics 103: 557–579.

Orr, H. A. and L. H. Orr, 1996 Waiting for {Speciation}: {The}{Effect} of {Population} {Subdivision} on the {Time} to {Specia-tion}. Evolution 50: 1742–1749.

Orr, H. A. and M. Turelli, 2001 The evolution of postzygoticisolation: accumulating dobzhansky-muller incompatibilities.Evolution 55: 1085–1094.

Presgraves, D. C., 2010 The molecular evolutionary basis ofspecies formation. Nat. Rev. Genet. 11: 175–180.

Reeve, E. C. R., 1955 INBREEDING WITH THE HOMOZY-GOTES AT A DISADVANTAGE. Annals of Human Genetics

19: 332–346.Simon, M., O. Loudet, S. Durand, A. Bérard, D. Brunel, F.-X. Sen-

nesal, M. Durand-Tardif, G. Pelletier, and C. Camilleri, 2008Quantitative trait loci mapping in five new large recombinantinbred line populations of {Arabidopsis} thaliana genotypedwith consensus single-nucleotide polymorphism markers. Ge-netics 178: 2253–2264.

Teuscher, F. and K. W. Broman, 2007 Haplotype probabilitiesfor multiple-strain recombinant inbred lines. Genetics 175:1267–74.

Turelli, M., N. H. Barton, and J. A. Coyne, 2001 Theory andspeciation. Trends in Ecology & Evolution 16: 330–343.

Turelli, M. and H. A. Orr, 2000 Dominance, {Epistasis} andthe {Genetics} of {Postzygotic} {Isolation}. Genetics 154: 1663–1679.

Welch, J. J., 2004 Accumulating dobzhansky-muller incompati-bilities: Reconciling theory and data. Evolution 58: 1145–1156.

Wolfram Research, I., 2015 Mathematica Version 10.1. WolframResearch, Inc., Champaign, Illinois.

Zheng, C., M. P. Boer, and F. A. van Eeuwijk, 2015 Reconstruc-tion of genome ancestry blocks in multiparental populations.Genetics 200: 1073–87.

DM incompatibilities in the Genomes of RILs 9

Appendix

A. General transition matrix

General form for the transition matrix T in the Markov chain, where w and w′ represent the fitness of a genotype (Table 1):

d1 d2 d3 d4 d5 d6 d7 d8 d9 d10

d1 1 0 0 0 0 0 0 0 0 0

d2 0 1 0 0 0 0 0 0 0 0

d3 0 0 w 0 0 0 0 0 0 0

d4 0 0 0 1 0 0 0 0 0 0

T = d514 0 w

4 0 w52 0 0 0 0 0

d614 0 0 1

4 0 12 0 0 0 0

d7 0 14 0 1

4 0 0 12 0 0 0

d8 0 14

w4 0 0 0 0 w8

2 0 0

d9(1−r)2

4(1−r)2

4r2w

4r2

4r(1−r)w5

2r(1−r)

2r(1−r)

2r(1−r)w8

2(1−r)2w9

2r2w10

2

d10r2

4r2

4(1−r)2w

4(1−r)2

4r(1−r)w5

2r(1−r)

2r(1−r)

2r(1−r)w8

2r2w9

2(1−r)2w10

2

For M1, w5 = w8 = w9 = w10 = w′, for M2, w5 = w8 = w9 = w10 = 1 and for M3, w5 = w9 = w10 = 1 and w8 = w′ (Table 2).

B. Model M0 for F∞

The following equations show the diplotype and haplotype probabilities for the model without selection (M0) for F∞:

Model M0

Diplotypes

d1, d2 =1

4r + 2

d3, d4 =r

2r + 1d5, . . . , d10 = 0

Haplotypes

hAA, hBB =1

4r + 2

hAB, hBA =r

2r + 1(8)

10 Maria Colomé-Tatché and Frank Johannes et al.

C. Model M0 at intermediate inbreeding generations

The following equations show the diplotype and haplotype probabilities for the model without selection (M0) at intermediate inbreeding generations,where we used u = [(1− 2r + 2r2)/2]t and v = [(1− 2r)/2]t.

Model M0

Diplotypes

d1, d2 =14

u +2r− 1

4(2r + 1)v− 1

2t+1 +1

4r + 2

d3, d4 =14

u +1− 2r

4(2r + 1)v− 1

2t+1 +r

2r + 1

d5, . . . , d8 = − 12

u +1

2t+1

d9 =12

u +12

v

d10 =12

u− 12

v

Haplotypes

hAA, hBB =r

2r + 1v +

14r + 2

hAB, hBA = − r2r + 1

v +r

2r + 1

D. Models M1, M2, M3 for F∞

Non-normalized diplotype and haplotype probabilities with selection for incompatibility models M1, M2 and M3 for F∞, for 0 < w, w′ ≤ 1:

DiplotypesModel M1

d1, d2 =(1− 2r)w′ + 2(r− 1)

2(w′ − 2)[(2r− 1)w′ + 2]d3 = 0

d4 =r((

2r2 − 3r + 1)

w′ + r− 2)

[(2r− 1)w′ + 2] [(2r2 − 2r + 1)w′ − 2]d5, . . . , d10 = 0

Model M2

d1, d2 =1

4r + 2d3 = 0

d4 =r

2r + 1d5, . . . , d10 = 0

Model M3

d1 =1

4r + 2

d2 = − 2r(r− 1)[2r(w′ − 1) + 1] + w′ − 22(w′ − 2)(2r + 1)[2r(r− 1)− 1]

d3 = 0

d4 =r

2r + 1d5, . . . , d10 = 0

HaplotypesModel M1

hAA, hBB =(1− 2r)w′ + 2r− 2

2(w′ − 2)[(2r− 1)w′ + 2]hAB = 0

hBA =r[(2r2 − 3r + 1)w′ + r− 2

][(2r− 1)w′ + 2] [(2r2 − 2r + 1)w′ − 2]

DM incompatibilities in the Genomes of RILs 11

Model M2

hAA, hBB =1

4r + 2hAB = 0

hBA =r

2r + 1

Model M3

hAA =1

4r + 2

hBB = − 2r(r− 1)[2r(w′ − 1) + 1] + w′ − 22(w′ − 2)(2r + 1)[2r(r− 1)− 1]

hAB = 0

hBA =r

2r + 1

E. Models M1, M2, M3 at intermediate inbreeding generations

Non-normalized diplotype and haplotype probabilities with selection for incompatibility models M1, M2 and M3 at intermediate inbreeding generations,for 0 < w, w′ ≤ 1. Note that u = [(1− 2r + 2r2)/2]t, v = [(1− 2r)/2]t, u′ = [(1− 2r + 2r2)w′/2]t, v′ = [(1− 2r)w′/2]t, a = 2w− (1− 2r)w′ andb = 2w−

(1− 2r + 2r2)w′.

DiplotypesModel M1

d1, d2 =r(r− 1)

2 (2r2 − 2r + 1)w′ − 2u′ +

2r− 12(4r− 2)w′ + 8

v′ +1

2t+2w′t

w′ − 2− 1

2t+1r(r− 1)

(1− 2r + 2r2)w′ − 1+

(1− 2r)w′ + 2(r− 1)2(w′ − 2)((2r− 1)w′ + 2)

d3 =w(w′ − 2w− 4rw′)

2ab

(w′

2

)t

+rw′

abwt+1 − (2r− 1)w

4av′ +

(1 + 2r− 2r2)w4b

u′

+

r2w[

w′(

w′2

)t(2w + (2r− 3)w′) + wt(2w2 − 3ww′ − (2r− 3)w′2)

](2w− w′)ab

d4 =

((4r4 − 8r3 + 8r2 − 4r + 1

)w′ − 6r2 + 6r− 1

)4 ((2r2 − 2r + 1)w′ − 2) ((2r2 − 2r + 1)w′ − 1)

u′ +(1− 2r)

4[(2r− 1)w′ + 2]v′ − 1

2tr(r− 1)

(2r2 − 2r + 1)w′ − 1

+r((

2r2 − 3r + 1)

w′ + r− 2)

((2r− 1)w′ + 2) ((2r2 − 2r + 1)w′ − 2)

d5, d8 = − 12

u′ +w′t

2t+1

d6, d7 = − 12t

r(r− 1)(2tu′ − 1

)(2r(r− 1) + 1)w′ − 1

d9 =12

u′ +12

v′

d10 =12

u′ − 12

v′

Model M2

d1, d2 =14

u +2r− 1

4(2r + 1)v− 1

2t+1 +1

4r + 2

d3 =w(2r2 − 2r− 1

)4(2r2 − 2r− 2w + 1)

u +w(1− 2r)

4(2r + 2w− 1)v +

12t+1

w1− 2w

+r[2r2 +

(−2w2 + 3w− 3

)r− 2w + 1

](2w− 1)(2r + 2w− 1) (2r2 − 2r− 2w + 1)

wt+1

d4 =14

u +1− 2r

4(2r + 1)v− 1

2t+1 +r

2r + 1

d5, . . . , d8 = − 12

u +1

2t+1

d9 =12

u +12

v

d10 =12

u− 12

v

(9)

12 Maria Colomé-Tatché and Frank Johannes et al.

Model M3

d1 =(2r + 1)

(u− 21−t)+ (2r− 1)v + 2

4(2r + 1)

d2 =

(r2 − r

)(w′ − 2) (2r2 − 2r− w′ + 1)

(w′

2

)t+1

+

(2r2 − 2r

)[2r(w′ − 1) + 1] + w′ − 2

2(w′ − 2)(2r + 1) (−2r2 + 2r + 1)− 1

2t+2

+(r2 − r)(2r2 − 2r− 2w′ + 1)

2(2r2 − 2r− 1)(2r2 − 2r− w′ + 1)u +

(2r− 1)4(2r + 1)

v

d3 = rwt+1 w[(

2r2 − 6r + 3)

w′ + 2r2 − 3r + 1]+(−2r2 + 3r− 1

)w′ − 4rw3 + 2w2(2r− 1)(w′ + 1)

(2w− 1) (2r2 − 2r− 2w + 1) (2r + 2w− 1)(2w− w′)

+(1− r)rww′

2 (2r2 − 2r− w′ + 1) (2w− w′)

(w′

2

)t

+(r− 1)rw

(2r2 − 2r− 2w′ + 1

)2 (2r2 − 2r− 2w + 1) (2r2 − 2r− w′ + 1)

u

+(1− 2r)w

4(2r + 2w− 1)v− 1

2t+2w

2w− 1

d4 =14

u +1− 2r

4(2r + 1)v− 1

2t+1 +r

2r + 1

d5, . . . , d7 = − 12

u +1

2t+1

d8 =12t

rw′(r− 1)(w′t − 2tu)1− 2r + 2r2 − w′

d9 =12

u +12

v

d10 =12

u− 12

v

HaplotypesModel M1

hAA, hBB =(2r− 1)w′ + 2r + 14[(2r− 1)w′ + 2]

v′ +1

22+t(w′ − 1)w′t

w′ − 2+

(1− 2r)w′ + 2r− 22(w′ − 2)[(2r− 1)w′ + 2]

hAB =

(12+

w(w′ − 2w− 4rw′)2ab

)(w′

2

)t

+rw′

abwt+1 − (2r + 1)w + (2r− 1)w′

4av′ +

(1− 2r + 2r2)(w′ − w)

4bu′

+

r2w[

w′(

w′2

)t(2w + (2r− 3)w′) + wt(2w2 − 3ww′ − (2r− 3)w′2)

](2w− w′)ab

hBA =

(2r2 − 2r + 1

)(w′ − 1)

4 ((2r2 − 2r + 1)w′ − 2 )u′ +

(−2r + 1)w′ − 2r− 14[(2r− 1)w′ + 2]

v′

+r((2r2 − 3r + 1)w′ + r− 2

)((2r− 1)w′ + 2) ((2r2 − 2r + 1)w′ − 2)

Model M2

hAA, hBB =r

2r + 1v +

14r + 2

hAB =

(2r2 − 2r + 1

)(w− 1)

4(2r2 − 2r + 1− 2w)u− (2r + 1)w + 2r− 1

4(2w + 2r− 1)v +

12t

(w− 1)2(2w− 1)

+2r2 +

(−2w2 + 3w− 3

)r− 2w + 1

(2w− 1)(2r + 2w− 1) (2r2 − 2r− 2w + 1)rwt+1

hBA =r

2r + 1(1− v)

Model M3

hAA =2rv + 1

2(2r + 1)

hBB =2−t−1 (r2 − r

)(w′ − 1)w′t+1

(w′ − 2) (2r2 − 2r− w′ + 1)+

(r2 − r

) (2r2 − 2r + 1

)(w′ − 1)

2 (−2r2 + 2r + 1) (2r2 − 2r− w′ + 1)u +

r2r + 1

v +

(2r2 − 2r

)[2r(w′ − 1) + 1] + w′ − 2

2(w′ − 2)(2r + 1) (−2r2 + 2r + 1)

hAB = −r(r− 1)

(u−

(w′2

)t)

w′

2 (2r2 − 2r− w′ + 1)+

r(1− r)w(2r2 − 2r− w′ + 1) (2w− w′)

(w′

2

)t+1

+r(r− 1)

(2r2 − 2r− 2w′ + 1

)w

2 (2r2 − 2r− 2w + 1) (2r2 − 2r− w′ + 1)u +

(1− 2r)w4(2r + 2w− 1)

v +1

2t+2w

1− 2w+

12t+2 −

v4

+w[(

2r2 − 6r + 3)

w′ + 2r2 − 3r + 1]+(−2r2 + 3r− 1

)w′ − 4rw3 + 2(2r− 1)w2(w′ + 1)

(2w− 1) (2r2 − 2r− 2w + 1) (2r + 2w− 1)(2w− w′)rwt+1

hBA = − r2r + 1

v +r

2r + 1

DM incompatibilities in the Genomes of RILs 13

F. Biased Single Seed DescentSchematic difference between randomized single seed descent and biased single seed descent. (A) in a strict single seed descent design, the breederselects a single seed from generation t to propagate to generation t+1. (B) However, typically the breeder selects from a collection of seeds at generationt and after germination uses a random seedling to propagate to generation t+1. (C) In the presence of genetic incompatibilities, some seeds fromgeneration t will not germinate and therefore will not be selected to propagate to generation t+1. This is equivalent to saying that the breeder will notpropagate a lost line.

14 Maria Colomé-Tatché and Frank Johannes et al.

top related