3€¦  · web view(all in windows word format) supplementary appendix 1 (including 2 tables) –...

44
Supplementary information (Jarne & David - Quantifying inbreeding in natural populations of hermaphroditic organisms) (all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for estimating the selfing rate, and associated technical problems. Supplementary Appendix 2 (including 3 figures) – The sampling properties of an estimator of the selfing rate in the single- locus case. Supplementary Appendix 3 (including 2 figures) – Joint estimation of the selfing rate and inbreeding depression. Supplementary Appendix 4 (including 1 table) – Accounting for the bias due to partial dominance when estimating the inbreeding coefficient: a general single-locus model.

Upload: others

Post on 22-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Supplementary information

(Jarne & David - Quantifying inbreeding in natural populations of hermaphroditic

organisms)

(all in Windows Word format)

Supplementary Appendix 1 (including 2 tables) – The three main molecular markers

(allozymes, microsatellites and AFLPs) used for estimating the selfing rate, and associated

technical problems.

Supplementary Appendix 2 (including 3 figures) – The sampling properties of an estimator

of the selfing rate in the single-locus case.

Supplementary Appendix 3 (including 2 figures) – Joint estimation of the selfing rate and

inbreeding depression.

Supplementary Appendix 4 (including 1 table) – Accounting for the bias due to partial

dominance when estimating the inbreeding coefficient: a general single-locus model.

Supplementary Appendix 5 (including 1 figure) – Estimating the selfing rate from linkage

disequilibrium data.

Supplementary Appendix 6 (including 1 table and 1 figure) – The progeny-arrays approach

(PAA): basic conditions and some pitfalls associated with technical problems.

References (to all appendices)

Page 2: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Supplementary Appendix 1 – The three main molecular markers (allozymes, microsatellites

and AFLPs) used for estimating the selfing rate, and associated technical problems.

Some general characteristics of allozymes, microsatellites and AFLPs are provided in Table 1.

More details are given in Avise (2000) and Lowe et al. (Lowe et al., 2004, chapter 2). These

characteristics should be thoroughly considered before launching a study on selfing rates. A

critical step is to produce as fair as possible data which requires using controls at several steps

(Lowe et al., 2004; Hoffman and Amos, 2005; Pompanon et al., 2005). The influence of a

given marker’s biological characteristics, as well as associated technical problems, on the

estimation of selfing rates are also discussed in main text. The technical problems are detailed

in Table 2, and their influence on the estimation of selfing rate is explained in main text, in

Supplementary Appendix 4 (estimates based on the inbreeding coefficient) and in

Supplementary Appendix 6 (progeny-arrays analyses).

2

Page 3: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Table 1. Some general characteristics of allozymes, microsatellites and AFLPs. SAD = short allele dominance; * = low; ** = intermediate; *** =

high. The financial costs include development and subsequent use. The table has been built with diploid organisms in mind, and the situation is

generally more complex with polyploids. a depends on the number of primer pairs used; b the Esterase family is an example; c some technical

problems are presumably more acute with dinucleotide motifs than with larger motifs; d refers to erroneous reading of an allele (band); e refers to

the influence of environmental (e.g., room temperature) and technical (e.g., chemicals, machines) factors on result quality; f more automatized

practises lead to less direct access to primary data.

3

Page 4: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Marker Allozymes Microsatellites AFLPs

Dominance codominant codominant dominantNumber of loci up to a few tens up to a few tens up to a few hundredsa

Number of alleles / locus often < 5 often < 10 2Mendelian transmission yes yes not alwaysAllelism (problems with) with multi-genes familiesb no when more than 2 alleles / locusBiological material

Amount g ng to µg ng to µgState / storing fresh / frozen fresh / frozen / alcohol fresh / frozen / alcohol

Technical problems c

Null alleles yes yes irrelevantBand stuttering no yes noFuzzy bands yes yes yesSAD irrelevant yes irrelevantMisreading d yes yes yesRepeatability ** ** *

Environmental influence e * ** ***Technical cost * ** **Automatization f * ** / *** ** / ***Financial costs * ** / *** **References Richardson et al. (1986)

Pasteur et al. (1987)Jarne and Lagoda (1996)Estoup and Angers (1998)Ellegren (2004)

Vos et al. (1995)

4

Page 5: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Table 2. Some technical problems encountered with the three molecular markers considered here, allozymes (Al), microsatellites (M) and AFLP.

For null alleles, SAD and band stuttering, appropriate methods can be used to analyse the source and magnitude of heterozygote deficiencies (e.g.

Van Oosterhout et al., 2004; David et al., 2007).

Source Definition Marker SolutionNull alleles Alleles with no electrophoretic expression / phenotype, because of failed primer

amplification at M loci or no enzymatic reaction at Al loci. Homozygous individuals (say 00) do not display any patterns, and heterozygous individuals (say B0) are read as homozygous for the other allele (BB)

Al, M Design more appropriate PCR primers (M)

Short allele dominance

Preferential PCR amplification of short alleles in heterozygotes, such that heterozygotes are misscored as homozygotes for the shortest allele

M Manipulate PCR conditions

Band stuttering Stuttered patterns at M loci results from additional PCR products which differ in size from the actual allele by even (and small) numbers of unit size (e.g., two base pairs for a dinucleotide). Heterozygotes may be misscored for homozygotes, most probably when alleles are separated by a single repeat unit.

M Manipulate PCR conditions

Fuzzy bands Bands (signals) of larger width than expected and blurred outlines. Might be due to too much PCR products or too active enzymatic reactions.

Al, M, AFLP

Manipulate PCR conditions(M, AFLP) or stop enzymatic reactions (A)

Miscoring Erroneous reading of bands leading to the creation of new (imaginary) alleles or to misreading of an already-existing allele

Al, M Pool alleles with similar mobilities

5

Page 6: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Supplementary Appendix 2 – The sampling properties of an estimator of the selfing rate in

the single-locus case.

Our goal is to examine the sampling properties, especially the variance, of estimators of the

selfing rate derived from the inbreeding coefficient (F). Several estimators of F are available

(see e.g., Curie-Cohen, 1982), but we focus on the ‘total heterozygosity’ estimator ( in

Curie-Cohen, 1982) which on the whole is the least biased and exhibits the lowest variance,

and is the one used in this review. Let us assume an inbred population (inbreeding coefficient

F) of infinite size with no mutation, migration or selection. Assume also a locus with k

codominant alleles Ai with frequency pi. n individuals are sampled.

Genotype AiAi AiAj (i ≠ j)

Observed number aii 2 aij

Expected number

The ‘total heterozygosity’ estimator is defined as:

(1)

with is the observed frequency of heterozygotes and the

expected frequency under random mating. Assuming that n is large enough, it is possible to

derive an approximate expression of , based on the Delta method (see e.g. Appendix 1

in Lynch and Walsh, 1998) and the variances and covariance of the numerator and

denominator of equation (1) (Curie-Cohen, 1982).

(2).

6

Page 7: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

When alleles are equifrequent ( ), this simplifies to:

(3).

Note that when F is small. It can be shown that:

(4)

which when alleles are equifrequent gives:

(5).

The variances of the inbreeding coefficient and the selfing rate are given in Figure 1 for a

three-allele locus in two contrasted situations with regard to allelic frequencies. An interesting

result is that the variance of S can be quite substantial when inbreeding is limited and allelic

frequencies are not balanced. The influence of gene diversity on the variance of the selfing

rate can be evaluated using equations (3) and (4) (equifrequent alleles) for various values of

the inbreeding coefficient (Figure 2). There is indeed a clear benefit to using polymorphic

loci, especially in rather outcrossing populations.

When several (L) loci are available, the inbreeding coefficient can be estimated as an

average value over loci. The sampling variance of F decreases with increasing L, and so does

the variance in S. However, the decrease in variance is less than linear with L when L and/or S

are high. This is of limited importance for large values of S because the single-locus variance

is already small (Figures 1 and 2). On the other hand, it might be asked whether the sampling

variance will be more efficiently minimized by increasing either n or L when S is small. In

such a situation, the population is essentially composed of two classes of individuals (selfed

and outcrossed), and the frequency of selfed individuals in a sample of n individuals is a

binomial variable with variance . This variance does not depend on the number of

7

Page 8: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

loci used. Another source of variance derives from determining the selfed versus outcrossed

status of each individual based on their genotype. This depends on the number and genetic

diversity of loci. The total variance is the sum of these two sources of variance. This is

illustrated in Figure 3 in which the two sources of variance are presented as a function of S (S

< 0.3) in the single-locus case. When He increases (compare the situation with He = 0.66 and

1), the part of total variance attributed to the Binomial variance increases, and it is worth

increasing n. When He is low, increasing L will provide more gain than when He is high. In

general it seems preferable to increase n because the total variance decreases in 1/n, while

only the non-Binomial component decreases when increasing L or He. However it is

sometimes less costly to score more loci than more individuals (e.g., when several loci are

scored in the same electrophoresis gel or co-amplified with the same PCR mix).

8

Page 9: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Figure 1. Variances of the inbreeding coefficient (empty squares) and the selfing rate (full

squares) as a function of the inbreeding coefficient using equations (2) and (4) in the three-

allele case. n = 100. A. p1 = 0.98, p2 = p3 = 0.01. B. p1 = p2 = 0.33, p3 = 0.34.

1A

0.00

0.04

0.08

0.12

0.16

0 0.2 0.4 0.6 0.8 1

F

1B

0.00

0.01

0.02

0.03

0 0.2 0.4 0.6 0.8 1

F

9

Page 10: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Figure 2. Variance of the selfing rate (equation (4)) as a function of gene diversity (He) when

the inbreeding coefficient is 0.01 (triangles), 0.2 (squares) and 0.8 (circles) – corresponding to

selfing rates of 0.02, 0.33 and 0.89 respectively. n = 100.

0.0

0.4

0.8

1.2

1.6

2.0

0 0.2 0.4 0.6 0.8 1

H e

Var (S )

10

Page 11: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Figure 3. Variance of the selfing rate as a function of S (equations (3) and (4)) when alleles

are equifrequent. The total variance is given for He = 0.66 (black triangles) and 1 (black

circles), and the binomial variance (independent of He) is indicated by white squares.

Sampling size is 30.

Var(S)

0

0.02

0.04

0.06

0.08

0 0.1 0.2 0.3S

11

Page 12: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Supplementary Appendix 3 – Joint estimation of the selfing rate and inbreeding depression.

Ritland (1990) proposed to jointly estimate the selfing rate and inbreeding depression based

on an experimental design in which a classical progeny-arrays (PA) analysis (Gn parents and

their Gn+1 offspring) is associated with estimation of the inbreeding coefficient in Gn+1 adults.

This is illustrated in Figure 1 where inbreeding is given as a function of time for adults that

are partially selfing (generation n), their offspring and adults from the next generation

(generation n+1). Partial selfing in Gn adults increases the inbreeding coefficient at

fertilization (corresponding to the primary selfing rate). Inbreeding is then reduced by natural

selection (inbreeding depression). The effect is stronger in outcrossers than in selfers (lower

and upper parts of panel respectively). The adult inbreeding coefficient in successive

generations might differ which constitutes a departure from inbreeding equilibrium

(outcrossing situation in Figure 1).

Ritland (1990) derived a simple expression relating S, F and inbreeding depression (δ)

assuming mixed mating and inbreeding equilibrium:

(1).

With no depression, we have equation 1 from main text. This formula can be used to depict

the relationship between the selfing rate (S) and the inbreeding coefficient (F) in adults for

various values of inbreeding depression (Figure 2). It suggests that progenies should be

sampled as early as possible in the life-cycle to approach to the primary selfing rate,

especially when inbreeding depression is strong.

12

Page 13: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Figure 1. Variation of the inbreeding coefficient F as a function of time in two successive

generations. Selfers and outcrossers are distinguished. Four-branch stars indicate the stage at

which progenies are generally genotyped in progeny-arrays (PA) analyses. Inbreeding

depression, indicated by arrows, measured through Ritland’s method (Ritland, 1990) covers

the period from these stars to Gn+1 adults. For the sake of clarity, a linear decline of inbreeding

depression is assumed.

Time

F

Adults Gn

Juven. Gn+1

Adults Gn+1

Outcrossers

Selfers

Fn+1

Fn

Fertilization

Sampling stage for PA

13

Page 14: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Figure 2. Relationship between adult inbreeding coefficient (F) and the selfing rate (S) using

equation (1). The values of inbreeding depression are, from right to left, 0, 0.5, 0.75 and 0.9.

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

F

S

14

Page 15: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Supplementary Appendix 4 – Accounting for the bias due to partial dominance when

estimating the inbreeding coefficient: a general single-locus model.

Let us assume a locus with n alleles. The allelic frequency of allele Ai is pi. The population

considered is at inbreeding equilibrium with actual inbreeding coefficient F. The actual

observed heterozygosity and gene diversity at this locus are Ho and He. The observed values

of these three parameters are F*, Ho* and He

*, and they might differ from the actual values due

to various technical reasons (see main text and Supplementary Appendix 1). For parameter X,

the bias ΔX is defined as . The relationship holds for both actual and

observed values, and can be used to derive the bias in F:

(1).

The relative biases in observed heterozygosity and gene diversity depend on

locus characteristics (number and frequencies of alleles) and on the kind of technical artefacts.

Such artefacts can be considered as various forms of dominance. We consider a general model

under which heterozygotes AiAj are read as homozygotes AiAi with probability ji and as

homozygotes AjAj with probability ij (0 <ji + ij < 1). Note that this means that the observed

heterozygosity will always be underestimated (negative bias). The actual and observed

frequencies of genotypes AiAj are Pij and Pij* respectively. Partial dominance decreases the

observed frequency of heterozygous genotypes, and . Summing over all

heterozygotes, it can be shown that:

(2)

where is the average apparent loss of heterozygotes due to partial dominance. Partial

dominance also modifies apparent allelic frequencies. For example, an allele i dominant over

15

Page 16: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

most (or all) other alleles (ji >> ij for all j) will increase in apparent frequency. The

frequency variation is given by:

(3).

Equation (3) can be used to derive the bias in expected heterozygosity:

(4).

The first term of equation (4) represents a variance in dominance among alleles and will

always be positive, while the second term represents a covariance between the average

dominance level of an allele and its frequency. In most situations, this term is expected to be

near zero, and the bias on gene diversity will usually be negative. Equations (2) to (4) can be

used to solve equation (1), and find the deviation due to partial dominance. Because both

observed and expected heterozygosities are underestimated, the two effects oppose each other

when computing ΔF (equation (1)). However ΔHe < ΔHo, because (ji - ij)2 << (ji + ij), and

F is therefore overestimated (ΔF > 0). This general framework allows analysing situations

encountered by experimenters:

- Random heterozygote loss (ji = ij = / 2 for all i,j). This will happen for example

when some alleles are not amplified or loose enzymatic activity by chance.

- Hierarchical dominance series (ji = if i > j, and 0 if j > i). Short-allele dominance

is a slightly more general case: , with g(x) an increasing function of x verifying 0 ≤

g(x) ≤ 1 when 0 ≤ x ≤ n-1 (i < j). This allows for various shapes of curve (e.g., linear,

quadratic). Band stuttering at microsatellite loci can be modelled as: when j = i + 1, and

otherwise.

- Null alleles (for all j, ij = 0 if i < n and nj = 1). By convention, all null alleles will

be lumped together as allele n.

16

Page 17: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Formulas for ΔHo / Ho, ΔHe / He, and ΔF are provided in Table 1 (exact formulas are

given together with first-order approximations). Note that for hierarchical dominance, we

consider the simple situation of equifrequent alleles. The expressions remain approximately

identical when this assumption is relaxed. For null alleles, we introduced a minor correction

to equations (1) to (4) to take into account the fact that null homozygotes will probably be

discarded from actual datasets. Note that these formulas do not account for sampling error on

allelic and genotypic frequencies.

17

Page 18: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Table 1. Formulas for ΔHo / Ho (always < 0), ΔHe / He (< 0, except for random heterozygote

loss), ΔF and ΔS (always > 0). pn is the frequency of null alleles. Note that the formula for kn

assumes equifrequent alleles.

Bias Random heterozygote loss

Hierarchical dominance

Null alleles

ΔHo / Ho - -

ΔHe / He 0 -kn Ho (1-F) 2

ΔF (1-F) (1-F) + O(2)

ΔS

with , and .

Page 19: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Supplementary Appendix 5 – Estimating the selfing rate from linkage disequilibrium data.

Cutter (2006) estimated the selfing rate from its long-term effect on recombination. Linkage

disequilibrium can be estimated from r2, the squared correlation coefficient between pairs of

nucleotidic sites. In a population at drift / recombination equilibrium:

(1)

with Ne the effective population size and ce the effective recombination rate. In an inbreeding

population with inbreeding coefficient F, . At inbreeding equilibrium,

. This provides an estimate of the outcrossing rate as:

(2).

Note that this equation does not provide meaningful estimates of S when , and

negative values of 1- S can easily be obtained. 1- S is presented as a function of r2 for various

values of Ne in Figure 1. It is clear that the conditions of the model will not necessarily be

fulfilled. It is also likely that r2 has a large variance (Hudson, 2001). This method might

though be useful in highly selfing species for which both sequence data and recombination

and effective population size estimates are available. For example, Cutter (2006) used it in the

nematode Caenorhabditis elegans, and returned estimates at least an order of magnitude

lower than direct estimates.

Page 20: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Figure 1. The outcrossing rate (log scale) as a function of r2 using equation (2) and assuming c

= 0.5. From top to bottom, Ne = 50, 1000 and 100000.

10-8

10-6

10-4

10-2

1

0 0.2 0.4 0.6 0.8 1r 2

1-S

20

Page 21: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Supplementary Appendix 6 – The progeny-arrays approach (PAA): basic conditions and

some pitfalls associated to technical problems.

1. The basic model

A detailed presentation of the basic model and some of its early extensions is given in Brown

et al. (1989). We briefly review it, and present recent developments. The PAA is based on the

comparison of mother and offspring genotypes. Its logical underpinnings can be exposed

using the simple one-locus two-alleles case with alleles A1 and A2 in frequency p and q

respectively. The expected number of offspring of each genotype is derived from the mixed-

mating model with selfing rate S. For example, an A1A1 mother with have progenies A1A1

with probability and progenies A1A2 with probability . The expected and

observed number of offspring can then be used to build the likelihood of a given array. This

might be generalized over several loci. Ritland (2002) proposed the following general

formulation. Let us assume that is the probability of observing a progeny with genotype

AkAl given parental genotype AiAj and selfing rate S. Under a mixed-mating system, the

multilocus likelihood becomes:

(1).

The likelihood of family m given parent n (genotype AiAj) is with Nkl

progenies with genotype AkAl. The likelihood of the array given all parental genotypes is:

with fn the frequency of parent n in the population (which depends on allelic

frequencies). The likelihood over all arrays is given by the product over all families .

Parameters (here S and allelic frequencies) are estimated by maximizing L using classical

methods (inversion of the information matrix; see e.g. Appendix 4 in Lynch and Walsh (1998)

21

Page 22: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

for a brief introduction to ML methods). This also allows building confidence intervals and

constructing tests based on likelihoods (e.g., likelihood ratio tests, Burnham and Anderson,

2002).

An important point is that the expectations are derived based on several assumptions.

Of importance are (see also Table 1 in main text): (i) the expected values of both S and p are

uniform over mothers; (ii) segregation of alleles follows Mendelian rules, (iii) mothers and

offspring genotypes are known without errors (no technical problems leading to scoring bias),

and (iv) no selection occurs between fertilization and the stage at which offspring are

genotyped. The latter point implies that progenies should be genotyped as early as possible in

order to access to the primary selfing rate, because inbreeding depression has a strong early

expression.

The mother genotype should not necessarily be known, but can be inferred together

with the selfing rate, provided enough offspring are screened (e.g., 15 to 20). A peculiar

situation is that of gymnosperms in which mother genotypes can directly be known from the

megagametophyte. Recognizing that seeds or ovules belong to a single progeny-array is

straightforward in sessile, brooding organisms which include plants, fungi, and several groups

of animals (e.g., cnidarians). This is not true anymore in mobile species in which newborns

get away from their mother (e.g., snails). In such a situation, estimating the selfing rate in

natural populations is difficult and one has to resort to more or less artificial conditions. One

possibility is to collect mature individuals in natural populations, set them in controlled

conditions under which offspring can be attributed to a given mother and collect their

offspring (e.g., Henry et al., 2005). The inferred selfing rate is that at the stage of the life-

cycle at which offspring are genotyped (see point (iv) above). If the focus is on the evolution

of selfing, it might be of interest to come as close as possible to the primary selfing rate.

Seeds or ovules might therefore be preferred to seedlings or juveniles.

22

Page 23: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

The basic model has been extended in several directions which are reported in main

text. The reader is referred to Ritland (2002) and Thompson and Ritland (2006), as well as to

MLTR documentation (http://www.genetics.forestry.ubc.ca/ritland/).

2. Markers

Individuals can be genotyped for various markers (see main text), but the most widely used

have been allozymes and microsatellites (Goodwillie et al., 2005; Jarne and Auld, 2006).

Dominant markers, such as AFLP, can be used, but require a much larger number of loci. The

reason is that fewer situations are favourable to the detection of outcrossing events: they can

be detected among the offspring of recessive homozygous mothers, while both homozygotes

can be used with a two-allele codominant locus. An important question is the number of

families, offspring per family and loci that should be studied. There is probably no single

answer to this question, and parameters such as the actual variance in S among families or

locus variability should be taken into account. The answer also depends on the model

considered (e.g., effective selfing, correlated matings; see main text) and the parameters to be

estimated. Ritland (1986) reports simulation results suggesting that there is little gain in using

more than eight to ten offspring per family when estimating S under the mixed-mating and

effective selfing models. Using highly polymorphic loci allows more precise estimates, since

outcrossed events are detected with less ambiguity, and the variance of various mating system

estimates decreases with the number of alleles per locus (Ritland, 1988). Such loci might

though be associated with larger error rates due to technical problems (Hoffman and Amos,

2005). In the correlated-matings model, the variance of the main parameters (selfing,

correlation of selfing and correlation of paternity) decreases with both the number of loci and

the number of alleles per locus (Ritland, 1989, 2002). In general, K. Ritland’s simulations

23

Page 24: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

suggest that there is little gain in using more than five to six loci when estimating mating

system parameters.

3. Some pitfalls: progeny-arrays and technical artefacts

The influence of partial dominance and the kind of technical artefacts mentioned in main text

(e.g., null alleles) have not been worked out, although K. Ritland mentions in the most recent

version of MLTR documentation (May 2004) that family estimates of selfing are sensitive to

scoring errors. We do not propose a general view on this problem, but as a first approach

consider a very simple situation in which S estimates might be biased by null alleles. Let us

assume a progeny-array analysis in which a large number of families are assessed, as well as a

large number of offspring per family (to avoid sampling variance). Individuals are genotyped

at a locus with three alleles (A1, A2 and A3; A3 is a null allele). We also assume that A1 and A2

have same frequency (q / 2), and that the frequency of A3 is p. To remain close to

experimental conditions, we consider that families with A3A3 (null homozygotes) mother are

eliminated, as well as offspring which are either A3A3, or incompatible with their mother’s

genotype. Although this might look at first glance as an unlikely situation, it should be

remembered that the maternal genotype is in some studies inferred from offspring genotype

(and a null allele at low frequency might well be “invisible”), or even worse corrected to be

consistent with those of offspring (K. Ritland, pers. comm.). This also means that the apparent

allelic frequency (estimated without taking the null allele into account) of both A1 and A2 is

½. The population has inbreeding coefficient F and selfing rate S. Let Z be the apparent

selfing rate.

Four situations can be distinguished with regard to the mother genotypes: null

homozygotes, null heterozygotes, homozygotes for A1 or A2, A1A2 heterozygotes. This

stratification can be used to derive , the likelihood of Z given the mother and

24

Page 25: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

offspring genotypes. The likelihoods are given in Table 1, together with the expected

frequencies of mothers and their actual and scored genotypes. The log-likelihood of the whole

sample can be derived from Table 1 (removing a constant and taking into account that p* = 0

and q* = ½ with p* and q* the frequencies of A1 and A2 estimated on data):

(2).

This can more simply be written as: , where

Q1 and Q2 correspond to the first and second terms in equation (2), and c is a constant with

regard to Z (third term). The maximum likelihood value of Z can be found by deriving this

equation with regard to Z and equating to 0. It comes , or:

(3).

When S is small, . This might be compared to the situation when the selfing rate is

estimated using the inbreeding coefficient (Supplementary Appendix 4) in which the bias is of

order 4p. The difference between actual and estimated selfing rates increases with the null

allele frequency and decreases with the selfing rate. An illustration is provided in Figure 1. As

mentioned in main text, technical problems can be detected when genotyping a large enough

number of progenies.

25

Page 26: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Table 1. Mother and offspring genotypes at a locus with three alleles, together with frequencies and likelihoods ( ).A3 is a null allele,

and the actual genotypes might differ from the scored genotypes. Null homozygous mother and offspring, as well as incompatible mother-

offspring pairs, are discarded (grey overlay). Frequencies are given assuming that A3 is a regular allele, and likelihoods assuming that it is a null

allele (denoted 0). The frequencies of A1, A2 and A3 are q/2, q/2 and p. S is the actual selfing rate, and Z the selfing rate to be estimated taking

into account the occurrence of a null allele. p* and q* are frequency estimates from data, i.e. 0 and ½ respectively.

, , , and . In the second and third rows, mother

genotypes before (resp. after) “/” are associated to offspring genotypes before (resp. after) “/”. More details in text.

Page 27: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Mother OffspringActual genot. Scored genot. Freq. Actual genot. Scored genot. FrequencyA3A3 00 Q A3A3 00

A1A3 or A2A3 A1A1 or A2A2

A1A1 or A2A2 A1A1 or A2A2 0 0A1A2 A1A2 0 0

A1A1/ A2A2 A1A1 / A2A2 P1 A1A1 / A2A2 A1A1 / A2A2

A2A2 / A1A1 A2A2 / A1A1 0 0A1A2 / A1A2 A1A2 / A1A2

A1A3 / A2A3 A1A1 / A2A2

A2A3 /A1A3 A2A2 / A1A1 0 0A3A3 / A3A3 00 / 00 0 0

A1A3 / A2A3 A1A1 / A2A2 P2 A3A3 / A3A3 00 / 00 excluded 0A1A3 / A2A3 A1A1 / A2A2

A2A3 /A1A3 A2A2 / A1A1 excluded 0A1A2 / A1A2 A1A2 / A1A2

A1A1 / A2A2 A1A1/ A2A2

A2A2 / A1A1 A2A2 / A1A1 0 0A1A2 A1A2 P3 A3A3 00 0 0

A1A3 or A2A3 A1A1 or A2A2

A1A1 or A2A2 A1A1 or A2A2

A1A2 A1A2

27

Page 28: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Figure 1. Difference between the estimated and actual selfing rates (ΔS) as a function of the

null allele frequency (p) for various values of S (diamonds: 0; squares: 0.2; triangles: 0.5;

crosses: 0.8) in the single-locus PAA. The difference is given by equation (3).

0

0.1

0.2

0.3

0.4

0.5

0 0.1 0.2 0.3p

ΔS

Page 29: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

References to Supplementary information

Avise JC (2000). Phylogeography. Harvard University Press: Cambridge, Massachusetts.

Brown AHD, Buron JJ, Jarosz AM (1989). Isozyme analysis of plant mating systems. In

Soltis D, Soltis P (eds) Isozymes in plant biology, Dioscorides Press. Pp. 73-86.

Burnham KP, Anderson DR (2002). Model selection and multimodel inference: a practical

information-theoretic approach. Springer-Verlag: New York.

Curie-Cohen M (1982). Estimates of inbreeding in a natural population: a comparison of

sampling properties. Genetics 100: 339-358.

Cutter AD (2006). Nucleotide polymorphism and linkage disequilibrium in wild populations

of the partial selfer Caenorhabditis elegans. Genetics 172: 171-184.

David P, Pujol B, Viard F, Castella E, Goudet J (2007). Reliable selfing rate estimates from

imperfect population genetic data. Mol. Ecol. 16: 2474-2487.

Ellegren H (2004). Microsatellites: Simple sequences with complex evolution. Nat. Rev.

Genet. 5: 435-445.

Estoup A, Angers B (1998). Microsatellites and minisatellites for molecular ecology:

theoretical and empirical considerations. In Carvalho G (eds) Advances in Molecular

Ecology, NATO press: Amsterdam. Pp. 55-86.

Goodwillie C, Kalisz S, Eckert CG (2005). The evolutionary enigma of mixed mating in

plants: occurrence, theoretical explanations, and empirical evidence. Ann. Rev. Ecol.

Evol. Syst. 36: 47–79.

Henry P-Y, Bousset L, Sourrouille P, Jarne P (2005). Partial selfing, ecological disturbance

and reproductive assurance in an invasive freshwater snail. Heredity 95: 428-436.

Hoffman JI, Amos W (2005). Micosatellite genotyping errors: detection approaches, common

sources and consequences for paternal exclusion. Mol. Ecol. 14: 599-612.

29

Page 30: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Hudson RR (2001). Two-locus sampling distributions and their application. Genetics 159:

1805-1817.

Jarne P, Auld JR (2006). Animals mix it up too: the distribution of self-fertilization among

hermaphroditic animals. Evolution 60: 1816-1824.

Jarne P, Lagoda PJL (1996). Microsatellites, from molecules to populations and back. Tr.

Ecol. Evol. 11: 424-429.

Lowe A, Harris S, Ashton P (2004). Ecological genetics - Design, analysis and application.

Blackwell.

Lynch M, Walsh B (1998). Genetics and analysis of quantitative traits. Sinauer: Sunderland,

Massachusetts.

Pasteur N, Pasteur G, Bonhomme F, Catalan J, Britton-Davidian J (1987). Manuel technique

de génétique par électrophorèse des protéines. Lavoisier: Paris.

Pompanon F, Bonin A, Bellemain E, Taberlet P (2005). Genotyping errors: causes,

consequences and solutions. Nat. Rev. Genet. 6: 847-859.

Richardson BJ, Baverstock PR, Adams M (1986). Allozyme electrophoresis: a handbook for

animal systematics and population studies. Academic Press: Sidney.

Ritland K (1986). Joint maximum-likelihood-estimation of genetic and mating structure using

open-pollinated progenies. Biometrics 42: 25-43.

Ritland K (1988). The genetic-mating structure of subdivided populations. 2. Correlated

mating models. Theor. Pop. Biol. 34: 320-346.

Ritland K (1989). Correlated matings in the partial selfer Mimulus guttatus. Evolution 43:

848-859.

Ritland K (1990). Inferences about inbreeding depression based on changes of the inbreeding

coefficient. Evolution 44: 1230-1241.

30

Page 31: 3€¦  · Web view(all in Windows Word format) Supplementary Appendix 1 (including 2 tables) – The three main molecular markers (allozymes, microsatellites and AFLPs) used for

Ritland K (2002). Extensions of models for the estimation of mating systems using n

independent loci. Heredity 88: 221-228.

Thompson SL, Ritland K (2006). A novel mating system analysis for modes of self-oriented

mating applied to diploid and polyploid arctic Easter daisies (Townsendia hookeri).

Heredity 97: 119-126.

Van Oosterhout C, Hutchinson WF, Wills DPM, Shipley P (2004). MICRO-CHECKER:

software for identifying and correcting genotyping errors in microsatellite data. Mol.

Ecol. 4: 535-538.

Vos P, Hogers R, Bleeker M, Reijans M, Vandelee T, Hornes M et al. (1995). AFLP- A new

technique for DNA-fingerprinting. Nucl. Ac. Res. 23: 4407-4414.

31