Download - A general model for sample size determination for ...repository.ias.ac.in/24389/1/308.pdfFor a diallelic situation, no verification is necessary as the expression (2) itself contains

A general model for sample size determination for collecting

germplasm

R L SAPRAt, PREM NARAIN* and S V S CHAUHAN**NBPGR, New Delhi 110012, India

*50 Director's Office, Indian Agricultural Research Institute, Pusa, New Delhi 110012, India**Department of Botany, B R Ambedkar University, Agra, India

Corresponding author: C/o Dr Prem Narain, 50 Director's Office, Indian Agricultural Research Institute,Pusa, New Delhi 110012, India (Fax, 91-11-5731495; Email, [email protected]).

The paper develops a general model for determining the minimum sample size for collecting germplasm forgenetic conservation with an overall objective of retaining at least one copy of each allele with preassignedprobability. It considers sampling from a large heterogeneous 2k-ploid population under a broad range ofmating systems leading to a general formula applicable to a fairly large number of populations. It is foundthat the sample size decreases as ploidy levels increase, but increases with the increase in inbreeding. Underexclusive selfing the sample size is the same, irrespective of the ploidy level, when other parameters areheld constant. Minimum sample sizes obtained for diploids by this general formula agree with those alreadyreported by earlier workers. The model confirms the conservative characteristics of genetic variability ofpolysomic inheritance under chromosomal segregation.

1. Introduction

Plant explorers and\ conservationists are faced with theproblem of collecting genetic material (vegetative orseeds) from large populations with a view to conser-ving the germplasm with a certain degree of assu-rance. The number of plants needed to conserve thegermplasm has been discussed in many papers (Allard1970; Marshall and Brown 1975; Qualset 1975; Bogyoet at 1980; Greogorius 1980; Chapman 1984; Yonezawa1985; Namkoong 1988; Crossa 1989; Yonezawa andIchihashi 1989; Crossa et at 1993; Lawrence et at1995a,b) using probability models mainly for diploidspecies.

Lawrence et at (1995a) suggested a sample of about172 plants for conserving all or very nearly all of thepolymorphic

genes with high probability provided the.rfrequency is not less than 0.05, irrespective of whetherthe

individuals of the species set all of their seed byself- or by cross-fertilization or by a mixture of both. Theygeneralized

their model for multiple loci with 2 alleles

under extreme cases i.e., complete selfing or completerandom mating. Crossa et al (1993) while discussing theoptimal sample size for regeneration suggested samplesizes of 160-210 plants for capturing alleles at frequenciesof 0.05 or higher in each of 150 loci, with 90-95%probability. When allele frequencies are unknown anequation for estimating an optimal sample size for cap-turing (a -I) rare alleles having an identical frequencyof (Po) and the ath allele occurring at a frequency of{I -(a -1)po} at a number of independent loci was alsodeveloped by the same authors. However, they did notincorporate a parameter for association between geneswithin individuals (or inbreeding coefficient) in theirexpression.

The present study is concerned with the developmentof a comprehensive model giving the required minimumsample size when the sampling is done from a largediploid or autopolyploid (2k-ploid) population with alldegrees of inbreeding. We have also tried to isolate thegeneral effects of polysomic inheritance and inbreeding,on sample size for collecting vegetative samples. Our

Keywords.

Conservation; germplasm; polysomic; vegetative; inbreeding

Biosci., 23, No.5, December 1998, pp 647-652. @ Indian Academy of Sciences 647

648 R L Sapra. Prem Narain and S V S Chauhan

We can numerically evaluate equation (1) to obtainvalues of n for a given probability of conservation. Forsake of simplicity, if we denote this probability by(1 -a), then

treament extends models adopted by Lawrence et al(1995a) and Crossa et al (1993) for diploid species.

2. Polysomic modela = {p~ (1 -FJ + p\Fk}"+{ p~ (1- FJ + P2Fk)".

2.1

Diallelic and single locus modelWe can further simplify the above expression by assumingthat the allele A2 is rare in nature and occurs with afrequency (PJ of the order of 0.05 or less. Then theterm {p~ (I -F J + PoF k) n is almost negligible and can

be dropped. Taking the logarithm on both sides of theexpression, we get

Let us consider a 2k-auto-polyploid population with 2alleles AI and A2 at a single locus having frequenciesPI and P2 respectively, reproducing by constant proportionsof selfing (s) and random mating (1 -s), with no doublereduction or selection. Such a population at equilibriumcan be denoted as

n > log (a)/Iog (p) . (2)

z= where P = (1 -pJ2k (1 -F J + (1 -Po>FkA2k

Ipik(l-FJ+p,Fk' A2k-1 A1 2

2kC1p~-lp2(1-FJ

From expression (2), we can derive two important resultsfor the extreme conditions of complete outcrossing andcomplete selfing, as follow:A2k-i AiI 2 ,

2kCip~-'p2(1-FJ '

A2k, 2

, p~(1-Fk)+P2Fk

s=o; (3)

(4)The sum of the allelic frequencies and the sum of thegenotypic frequencies as given above is one. F k is thetheoretical populational inbreeding coefficient at equili-brium for a 2k-ploid organism and is related to theproportion of ~ selfing(s) by the following formula(McConnell and\ Fyfe 1975):

n > log (a)/2k log (1 -p()

n > log (a)/log (1 -pJ .s=

Expressions (3) and (4) indicate that the sample sizeunder complete selfing is almost 2k times of that underrandom m8;ting. As expected, with mixed matingsystems, the sample size lies between these two limitsand can be obtained numerically by evaluating either(1) or (2).Fk=s/{2k-(2k-l)s},

When k = 1, the population becomes a diploid and canbe represented as Multiallelic and multilocus model2.

A1A2

2PtP2 (l-FJz= AI2

~(1-F1)+PIFI

A22p~ (1 + FJ + P2F\

where F. =s/(2-s).Our objective of conservation means that a randomly

drawn sample from a 2k-ploid population should captureat least one copy of each allele. This can be achievedif the sample contains either one of the heterozygotesor one each of the homozygotes A~ and A~. Theprobability of capturing at least one copy of each allelecan be calculated simply by excluding the probabilityof selecting only one of the homozygotes in a sampleof size n, as suggested by Lawrence et al (.1995a) fordiploid models. Thus the probability that a randomlydrawn sample of size n contains at least one copy ofeach alleles at the said locus is

Let us consider again a 2k-auto-polyploid populationwith a number of alleles at each of). independent lociand producing a proportion of seed (s) by selfing and(1 -s) by random mating. The number of genotypes forsuch a population become too large for computationwhen k > 2 and a > 2. Therefore we will deduce ourresults on the basis of methodology and results alreadypublished by Crossa et at (1993) without a completealgebraic explanation. However, we present an empiricalverification of our results in an appendix.

Let us recall the results obtained by Crossa et at(1993) by assuming that (a -1) alleles have identicallow frequencies of Po and the ath allele has a frequencyof {1 -(a -l)po} for each of the). independent loci.Thus, for .,

Jog (a) -log (a -J) }/BAlleles (a), loci (1); n>(5)

P[AI'

AJ=l-{p~k(]-Fk)+p,Fk}n

-{p~(1-Fk)+P2Fk}n. Alleles(a), loci (A); n >A/B (6)(1)

A general model for sample size determination for collecting germplasm 649

term (1 -PO> by P in expression (2) to simplify thederivation of results without giving a mathematical proof.For a diallelic situation, no verification is necessary asthe expression (2) itself contains the said term. Howeverfor the multiallelic situation, we have verified it empi-rically by considering a single locus with 3 and 4 allelesfor diploids and single locus with 3 alleles for tetraploids(see appendix). We determined sample sizes exactly upto the first decimal place as described in the appendixfor 60 cases with varying levels of probability of con-servation, rare allele frequencies, and selfing rates. Theseexact values were then compared with those obtainedfrom our expression (8). To our expectation, in 22 cases,our sample sizes matched exactly to the first decimalplace, in 27 cases, the absolute difference was less thanor equal to 0.5, in 7 cases, it was 0.6 to 0.9, and, onlyin 4 cases, we observe absolute differences ranging from1.1 to 1.6. Of course actual sample size is a wholenumber. Rounding our calculated values to integers canat the most overestimate or underestimate the samplesize by a single plant (tables 1,2,3). The close agreementof exact values with those calculated by expression (8)over varied conditions justifies our elemination of theterm {p~ (I -F k) + POFk}n, thus, expression (8) can besafely applied for determining minmum sample sizeswith a high degree of accuracy for large populations.

where A = log {I -(I-a)I/A}-log (a-I) and B=log (1 -PO> .

Comparing our result (4) with (5) and (6) obtainedby Crossa et at (1993) we find that (4) is a specialcase of (5) and (6) when there are only 2 alleles at asingle locus. All these results contain the term (1 -pJwhich is nothing but the probability of capturing a rareallele in a sample of size one when the sampling isdone either from an inbred population or a populationcontaining only homozygous lines. However, when onesamples from a population which deviates from selfing,a term other than (1 -pJ must be incorporated to accountfor this deviation from selfing as well as for diploidyin the expressions (5) and (6). When we replace theterm (1 -pJ in expression (6) by P from expression (2)it can describe both multiallelic as well as multilocuspopulations. After replacement we get

n > A/log (/:1). (7)

Expression (7) evaluates sample sizes directly for agiven set of parameters. This can be further rewrittento isolate the effects of deviation from selfing anddiploidy present in our model on the sample sizes asfollows:

n >A/(B+ C), (8)3. Conclusion

In the present paper we attempt to determine a theoreticalminimum number of vegetative samples to capture allalleles from a population with a given probability ofconservation. We developed a general model by consi-dering a 2k-auto polyploid population under a broadrange of mating systems. Our work is primarily basedon the diploid models suggested by Yonezawa andIchihashi (1989), Crossa et al (1993) and Lawrence etal(1995a). Theoretically speaking, the required minimumsample size under our model is A/(B + C) which liesbetween the bounds A/2kB and A/B, attained under theextreme conditions of no selfing and no random mating.Crossa et al (1993) reported a similar conclusion, butfor a diploid model. They indicated that if there are noassociations between genes within individuals at anyloci, then the required sample size is exactly half thesample size of that under perfect association. If thedegree of association is unknown, then the requiredsample size is between n/2 and n. Our general modelyields the same results as given by the said authorswhen k= 1 and s =0 or 1.

Minimum sample sizes for given probabilities ofconservation, rare allele frequencies, and numbers ofalleles and loci, under our set of assumptions have aminimum value of A/B for all inbred populationsirrespective of ploidy level. Sample sizes are smallest

where C=log{{I-PIJ2k-l (I-FJ+F.}.Thus, in expression (8) we have minimal sample size

as n > AI(B + C) whereas, Crossa et at (1993) obtainedn > AI B, which is possible when C = 0 or the populationin consideration is completely inbred (s = 1). The termC involves the polyploidy parameter (k) and the corres-ponding inbreeding coefficient (FJ; hence, it accountsfor a reduction in sample size owing to deviations fromselfing and diploidy. For a given ploidy level, reductionin sample size continues until the state of completerandom mating, where sample size reaches a minimumvalue of AI2kB. Thus, for any given population, theminimum sample size lies between AI2kB and AIB. Theupper bound (AI B) is attained under the condition ofno random mating (s = 1) and is unaffected by the ploidylevel. As we deviate from complete inbreeding, the roleof ploidy in reducing minimal sample size increases untilthe minimum value (AI2kB) is reached. This occursunder the condition of no selfing (s = 0). Thus, thesample sizes under this condition for diploid, tetraploid,hexaploid and octaploid populations are almostnul2, nul4, nul6, and nu/8, respectively, where nu isAIB. With increasing ploidy, the curves displayed infigure 1 almost become parallel to the X-axis except atvery high rates of selfing where the minimum samplesize rises steeply.

One may debate our justification for replacing the

650 R L Sapra, Prem Narain and S V S Chauhan

200

180

160

140

120

G)N

.in

.P. 100a.E'"II)

80

60

40

20

00.0 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.00.5

Proportion of selling (5)

Figure 1. Relationship between sample size and proportion of selfing(s) at various ploidy levels for a population with 2 alleles,Po = 0.05 and probability of conservation, 99.99%.

Tables 1 and 2. Comparison of results of exact calculations (na) with those obtained by expression (8) nf for variousvalues of a, s, k, I -a, and Po'

Table 1. Diploid with 3 alleles. Table 2. Diploid with 4 alleles.

"a -n,33333333333333333333

0.00.20.50.81.00.00.20.50.81.00.00.20.50.81.00.00.20.50.81.0

99.9999.9999.9999.9999.9995.()()95.()()95.()()95.()()95.()()99.9999.9999.9999.9999.9995.()()95.()()95.()()95.()()95.()()

0.050.050.050.050.050.050.050.050.050.050.010.010.010.010.010.010.010.010.010.01

96.5102.4116.2145.4193.136-038.143.354.271.9

492.7521.8591.6739.7985.4183.5194.4220.4275.5367.0

96.6102.4116.3145.5193.136.038.343.554.472-2

492.7521.9591.7739.7985.4184.2195.0221.1276.4368.3

44444444444444444444

0.00.20.50.8].00.00.20.50.8] .00.00.20.50.8].00.00.20.50.8].0

1

99.9999.9999.9999.9999.9995.0095.0095.0095.0095.0099.9999.9999.9999.9999.9995.0095.0095.0095..0095.00

0.050.050.050.050.050.050.050.050.050.050.010.010.010.01().Ol0.010.010.010.010.01

100.5 100.5 0-0106.5 106-6 0-1121-0 121-1 0-1151-4 151.4 0-0201-0 201-0 0-0

39-9 39-8 -0-142-3 42-2 -0.148-1 47-9 -0-260.1 60-0 -0.179.8 79.6 -0.2

512.9 512-9 0.0543.2 543-2 0.0615.9 615.9 0.0769.9 770-0 0.1

1025-7 1025.8 0.1203.7 202.9 -0.8215.7 214.9 -0.8244.6 243- 7 -0.9305.8 304-6 -1.2407.4 405-8 -1.6

111

A general model for sample size determination for collecting germplasm 651

Table 3. Tetraploid with 3 alleles.Genotype

ka

33333333333333333333

s -a Po nj na n -nQ :r

0.00.00.10.00.00.00.00.00.0

-0.10.10.0o.}0.10.0

-0.3-0.3-0.3-0.5-1.1

Frequency

AlAI

G1

p~ (I-FJ +PIF1

A~2

G2

~ (l-FJ +p2F,0.00.20.50.8}.o0.00.20.50.8}.o0.00.20.50.8}.o0.00.20.50.8}.o

22222222222222222222

99.9999.9999.9999.9999.9995.0095.0095.0095.0095.0099.9999.9999.9999.9999.9995.0095.0095.0095.0095.00

0.050.050.050.050.050.050.050.050.050.050.010-010.010.010.010.010.010.010.010.01

48.350.757.479.]

]93.]18.018.921.429.47].9

246.3257.9290.4395.9985.49].896.]

108.2147.5367.0

48.350.757.579. ]]93.]18.0]8.92].429.47].8

246.4257.9290.5396.0985.49].595.8

]07.9]47.0365.9

Genotype

Frequency

A~3

G3

p; (1- F1) +p3FI

A.A2

G4

2P.P2 (1 -FJ

Genotype

Frequency

AzA3

Gj

2P:1P3 (1 -FJ)

A~I

G6

2PJ1J1 (1 -FJ

The expression for evaluating n can be formulated as

where P [Ai,~, A3] is the probablity of including all thealleles at least once in a sample of size n, P (Ai)C is theprobability of missing Ai, and P (Ai Aj)C is the probabilityof missing both Ai and Aj. Putting the values of proba-bilities in terms of genotypic frequencies in the aboveexpression, .we get a simpler expression for evaluating n:

under random mating equilibrium. Sample sizes in thisstate reduce further with increasing ploidy levels. Thebehaviour of our model confirms the conservative char-acteristics of genetic variability related to polysomicinheritance as reported by Bray (1983) in the case oftetraploids.

Notably ~r treatment only provides a model for therequired minimum sample size for collecting the vege-tative-materials. As mentioned by Lawrence et at (1995a)when seed is collected, sampling is done from the nextgeneration, because the plants raised from this seed arethe offspring of the plants from which the collectionshave been made. Potentially, questions of how manyseeds per plant should be sampled or whether we canachieve greater efficiency by collecting more seeds froma smaller number of plants have been investigated byYonezawa and Ichihashi (1989) using probability modelsand Lawrence et at (1995b) using the analytical proce-dures of quantitative genetics for diploid species. Thisproblem in relation to our model needs further investi-gation.

a = Co + Co + Co -Go -Go -Go1 2 3 I 2 3 . (a)

where

C1 = 1 -G. -G4 -G6 '

C2= I-G2-G4-Gs'

C3 = 1 -G3 -GS -G6 .

(ii) Diploid with 4 alleles

Similarly we can formulate the expression for 4 allelesas;

4

P[A.,A2,A3,A4] =(l-a)= l-L P(A;')

Appendix4 4

+L, P(A,Aj)C-L,l=i<jS4 1=i<j<kS4

As mentioned above, here we will outline proceduresfor determining the minimum sample size with givenprobablity of conserving at least a copy of each allelepresent in the population for three special cases.

(i) Diploid with 3 alleles

Let us denote a diploid population with 3 alleles AI' A2,and A3 as

P (A;AjAk)C

where P [AI, A2' AJ' A4] is the probability of including allalleles (AI, A2' AJ' A4) at least once in a sample of sizen, P (AJc is the probability of missing Aj' P (Ai Aj)C isthe probability of missing both Ai and Aj; andP(Aj Aj AJc is the probability of missing 3 alleles (Ai,

Aj' AJ at a time. After putting the values of probabilities

652 R L Sapra, Prem Narain and S V S Chauhan

in terms of genotypic frequencies, we get the finalexpression which can be numerically solved for n

References

a = cn + cn -cn + cn -cn -cn -cn -cn12345678

-c; -C:o+G7 + G~+G; + G~ (b)

G, = p~ (1 -F,) + p,F" G2 = p~ (1 -FJ + p2F, '

G3 =~ (I-F.) +p3F" G4 =p~ (1- F.) +p3F.,

Gs = 2p.P2 (1 -F,), G6 = 2P.P3 (1 -FJ,

G1 = 2p.P4 (1 -FJ, Gs = 2P1P3 (1 -F,),

G9 = 2p1P4 (1- F]), G.o= 2PR4 (1- FJ,

C. = 1 -G. -Gs -G6 -G1, C2 = 1 -G2 -Gs -Gs -G9

C3=I-G3-G6-G6-G]o' C4= I-G4-G1-G9-G]o

Cs= G3 +G4 + G.o' C6 = G2+G4+ G9,

C1= G2 +G3 +Gs' Cs= G] + G4 + G1,

C9=G.+G3+GS' C]o=G]+G2+GS'

(iii) Tetraploid with 3 alleles

We can also formulate the expression for a tetraploidpopulation with 3 alleles AI, A2, and A3' and 15 possiblegenotypes as

a=C"+C"+C"-G"-G"-G" (c)I 2 3 I 2 3'

"

G, =p~ (I-F,) +p,F" G2=p~ (1- FJ + p2F, ,

G3=p~(I-FJ+P3F" G4=4p~P2(I-FJ,Gs = 4PiP3 (1- FJ, G6 = 2~p, (1 -FJ,

G7=4~P3(I-FJ, G8=4~p, (I-FJ,G9 = 2~P2 (1 -FJ, G,o = 12Pip2p3 (1 -F,),

-2 -2G" -12P2p,p3 (1 -FJ, GI2 -12P3PtP3 (1 -FJ,G'3 = 6pip~ (1 -F,), G'4 = 6p~p~ (1 -FJ,

G,s= 6pip; (1- FJ,

C, =G2+G3+G7+G9+G'4'

C2=G, +G3+GS+G8+G,s'

C3=G, +G2+G4+G6+G'3°

Allard R W 1970 Population structure and sampling methods;in Genetic resources in plants-their exploration and conser-vation (oos) 0 H Frankel and E Bennett (Oxford and Edinburgh:Blackwell) pp 97-107

Bray R A 1983 Stratagies for gene maintenance; in Geneticresources of forage plants (eds) J G Mcivor and R A Bray(Melbourne: CSIRO)

Bogyo T P, Porecoodu E and Perrino P 1980 Analysis ofsampling stratagies for collecting genetic material; &on. Bot.34 160-174

Chapman C G D 1984 On the size of genebank and the geneticvariation it contains; in Crop genetic resources: conservationand evaluation (eds) J H W Hoden and J T Williams (London:Allen and Unwin) pp 102-108

Crossa J 1989 Methodologies for estimating the sample sizerequired for conservation of outbreooing crops; Theor. Appl.Genet. 77 153-161

Crossa J, Harnandez C M, Bretting P, Eberhart S A and TabaS 1993 Statistical genetic considerations for maintaininggermplasm collections; Theor. Appl. Genet. 86 673-678

Gregorious H R 1980 The probability of loosing an allele whendiploid genotypes are sampled; Biometrics 36 643-652

Lawrence M J, Marshall D F and Davies P 1995a Genetics ofgenetic conservation. I. Sample size when collecting germplasm;Euphytica 84 89-99

Lawrence M J, Marshall D F and Davies P 1995b Genetics ofgenetic conservation. II. Sample size when collecting seed ofcross pollinating species and the information that can beobtained from the evaluation of material held in gene banks;Euphytica 84 101-107

Marshall D R and Brown A H D 1975 Optimum samplingstratagies in genetic conservation; in Crop genetic resourcestoday and tomorrow (eds) 0 H Frankel and J G Hawkes(Cambridge: Cambridge University Press) pp 53-80

McConnell Gillian and Fyfe J L 1975 Mixoo selfing andrandom mating with polysomic inheritance; Heredity 34 271-m

Namkoong G 1988 Sampling for germplasm collections; Hortic.Sci. 2379-81

Qualset C 0 1975 Sampling germplasm in a center of diversity:an example of disease resistance in Ethiopian Barley; in Cropgenetic resources today and tomorrow (eds) 0 H Frankel andJ G Hawkes (Cambridge: Cambridge University Press) pp81-96

Yonezawa K 1985 A definition of the optimal allocation ofeffort in conservation of plant genetic resources-With appli-cation to sample size determination for field collectionEuphytica 34 345-354

Yonezawa K and Ichihashi H 1989 Sample size for collectinggermplasm from natural plant population in view of genotypicmultiplicity of seoo embryos borne on a single plant; Euphytica4191-97

MS

received 12 June 1998; accepted 16 September 1998

Corresponding editor: VIDY ANAND NANJUNDJAI

Download - A general model for sample size determination for ...repository.ias.ac.in/24389/1/308.pdfFor a diallelic situation, no verification is necessary as the expression (2) itself contains

Top Related