module 8: estimating genetic variances – nested design –gca, sca – diallel pbg 650 advanced...

21
Module 8: Estimating Genetic Variances – Nested design –GCA, SCA – Diallel PBG 650 Advanced Plant Breeding

Upload: gwendoline-wiggins

Post on 31-Dec-2015

267 views

Category:

Documents


6 download

TRANSCRIPT

Module 8: Estimating Genetic Variances

– Nested design

–GCA, SCA

– Diallel

PBG 650 Advanced Plant Breeding

Nested design

• Also called– North Carolina Design 1

– Hierarchical design

• Two types of families

– Half sibs (male groups)

– Full-sibs (females/males)

Males Females

1

2...m

1234

5678

.

.f

Nested design – one location

Source df MS Expected Mean Square

Blocks r-1 MSR

Males m-1 MSM

Females/males m(f-1) MSF

Error (r-1)(mf-1) MSE

2

M

2

M/F

2

e rfr 2

M/F

2

e r 2

e

Linear Model Yijk= + Bi + Mj + Fk(j) + eijk

rf

MSMS FM2

M

r

MSMS eF2

M/F

F

Mmales

MS

MSF

e

FM/F

MS

MSF

• Might also have sets and multiple environments

See Bernardo, pg 164, for ANOVA with sets and environments

Variance components from the nested design

2

M

2

A

2

A412

M

2

Halfsibs

F1

4

(if the parents are not inbred)

2M

2M/F2

2

D

2

D412

A412

Halfsibs

2

Fullsibs

2

M/F

2

M/F

2

M

2

Fullsibs

)F1(

4

(if the parents are not inbred)

Expected Mean Squares in SAS

Source Type III Expected Mean SquareLoc Var(Error) + 3 Var(Loc*Cultivar) + 7 Var(Rep(Loc)) + 21 Var(Loc) Rep(Loc) Var(Error) + 7 Var(Rep(Loc))Cultivar Var(Error) + 3 Var(Loc*Cultivar) + Q(Cultivar)Loc*Cultivar Var(Error) + 3 Var(Loc*Cultivar)

Proc GLM;Class Loc Rep Cultivar;Model Yield=Loc Rep(Loc) Cultivar Loc*Cultivar;Random Loc Rep(Loc) Loc*Cultivar/Test;Run;

• Random statement generates expected mean squares

• Test option obtains appropriate F tests for the model specified

• In the example below, cultivars are fixed, all other effects are random

controversial (could be dropped)

• Proc Mixed may give better estimates of variance components

fixed effect

2

Loc

Combining Ability

• General combining ability (GCA)– the average of all F1 crosses from a line (or genotype), expressed as a deviation from the population mean

• The expected value of a cross is the sum of the combining ability of its two parents

• Specific combining ability (SCA)– the deviation of a cross from its expected value

2P1P2P1P SCAGCAGCAXX Where X is the performance of the cross

2

SCA

2

GCA

2

X 2 2P1P2P1P2xP1P2xP1P SCAGCAGCAG

Estimation of combining ability

GCA

• polycross method - allow all lines to intermate naturally

• top crossing - a line is crossed to a random sample of plants from a reference population

GCA and SCA

• Factorial design (NC Design II) – a group of ‘male’ parents is crossed to a group of ‘female’ parents– requires mxf crosses (e.g. 5x5=25)

– can be applied to two heterotic populations

• Diallel – all possible crosses among a set of parents– n(n-1)/2 possible crosses without parents or reciprocals

(e.g. 10x9/2=45)

Variations on the Diallel

• Type of cross-classified design

• With or without the parents

• With or without reciprocal crosses– bulk seed from both parents if maternal effects are not important

• Genotypes may be random or fixed– For random model, need many parents to adequately sample the

population

• Large number of crosses!– Can be divided into sets

– Partial diallels can be conducted

• If parents are inbred, can make paired row crosses to obtain more seed

Hallauer, Carena, and Miranda (2010) pg 119-138

Griffing’s Methods (Diallels)

• Method 1– all possible crosses, including selfs

• Method 2– no reciprocals

• Method 3– no parents

• Method 4– no parents or reciprocals

– most common, because parents often inbred and less vigorous

For each Method, genotypes may be Model I = FixedModel II = Random

Diallel crossing

Parent A B C D ……. N Mean

A a+a a+b a+c a+d a+n a

B b+b b+c b+d b+n b

C c+c c+d c+n c

D d+d d+n d

N n+n n

…..

…..

Diallel analysis

Random model

• Usually does not include parents and reciprocals

• Can be divided into sets

Griffing (1956) is classic reference

Source df MS Expected Mean Square

Blocks r-1

Crosses [n(n-1)/2] -1 MS2

GCA n-1 MS21

SCA n(n-3)/2 MS22

Error (r-1){[n(n-1)/2] -1} MS1

2

C

2

e r 2

GCA

2

SCA

2

e )n(rr 2

2

e

2

SCA

2

e r

)2n(r

MSMS 22212

GCA

r

MSMS 1222

SCA

Genetic variances from random model

2

GCA

2

A

2

AHS

2

GCA

F1

4

4

F1Cov

2

SCA2

2

D

2

D

2

HSFS

2

SCA

)F1(

4

4

)F1(Cov2Cov

2f

MS

k

2)ˆ(Var

g

2

g

2

2

g

k=coefficient of MSfg=df of the mean square

General form for variance of a variance component

Fixed model

• GCA effects

• SCA effects

..X2.nX)n(n

g ii2

1

..X)n)(n(

2.X.X

)n(nXs

212

1jiijji

2

ei

2

)2n(n

1n)g(

2

eij

2

1n

3n)s(

Advantage: first order effects (means)are estimated with greater precisionthan variances

Lattice designs are useful

Diallel analysis with parents

Source df

Blocks r-1

Entries [n(n+1)/2]-1

Parents n-1

Parents vs crosses 1

Crosses [n(n-1)/2]-1

GCA n-1

SCA n(n-3)/2

Error (r-1){[n(n+1)/2] -1}

Source df

Blocks r-1

Entries [n(n+1)/2]-1

Varieties n-1

Heterosis n(n-1)/2

Average

1

Variety n -1

Specific n(n-3)/2

Error (r-1){[n(n+1)/2] -1}

Gardner-Eberhart Analysis II

• Gardner-Eberhart partitioning of Sums of Squares is non-orthogonal• Fit model sequentially

Factorial Mating Design

1 2 3 4 1 2 3 4

Parents(females)

Parents(females)

1 ….. X12 X13 X14 5 X15 X25 X35 X45

2 ….. X23 X24 6 X16 X26 X36 X46

3 ….. X34 7 X17 X27 X37 X47

4 ….. 8 X18 X28 X38 X48

Parents (males) Parents (males)Diallel Factorial (Design II)

Parents Diallel Factorial

4 6 4

6 15 9

10 45 25

20 190 100

100 4950 2500

n n(n-1)/2 n2/4

General formula for covariance of relatives

2D

2ArCov

A B C D

X Y

r = 2XY

= ACBD + ADBC

Extended to include epistasis:

... 2DD

22AD

2AA

22D

2A rrrCov

Epistatic Variance

• Often assumed to be absent, but could bias estimates of A

2 and D2 upwards

• Estimation requires more complex mating designs

• Expected to be smaller than A2 and D

2, so larger experiments are needed for adequate precision

• Coefficients are correlated with those for A2 and D

2, which leads to multicollinearity problems

• For most crops, experimental estimates of epistatic variance have been small

Example of mating design to estimate epistatic variance

• Design I experiment from ‘Jarvis’ and ‘Indian Chief’ maize populations

• Obtained random inbred lines from each population, which were used as parents in a Design II experiment

• A comparison of these values can be made to estimate epistatic variances

Eberhart et al., 1966

.../ 2DD4

12AD2

12AA4

32D

2A

2mf

2G 4

0

... 2DD

2AD

2AA

2D

2A

2mf

2f

2m

2G1

2G

2G 01

Precision of variance components

• Minimum of 50-100 progeny to adequately sample population (Bernardo’s advice, some would say more!)

• Large numbers of progeny do not guarantee precise estimates of variance

• Confidence intervals can be determined for estimates of variance (sets lower and upper bounds)

• It’s possible in practice to obtain negative estimates of variance components, but they are theoretically impossible

– large error variance

– true estimate of genetic variance is close to zero

– Report as zero? (may lead to bias when results are compiled across many experiments)

See Bernardo, pg 166, for further details on confidence intervals

Resampling methods

• Confidence interval calculations assume that the underlying distribution is normal. Work best for balanced data.

• Resampling methods are useful when– underlying distributions are unknown or are not normal

– we don’t know how to estimate the confidence interval

• Examples– Bootstrap – resampling with replacement

– Jackknife – systematically delete data points

– Permutation test – data scrambling• only works when there are two or more types of families