genetic gain increases by applying the usefulness ...bÞ; bernardo (2014) and mohammadi et al....

11
| INVESTIGATION Genetic Gain Increases by Applying the Usefulness Criterion with Improved Variance Prediction in Selection of Crosses Christina Lehermeier,* ,1 Simon Teyssèdre, and Chris-Carolin Schön* *Plant Breeding, Technical University of Munich, 85354 Freising, Germany and RAGT 2n, Genetics & Analytics Unit, 12510 Druelle, France ORCID ID: 0000-0001-7724-0887 (C.L.) ABSTRACT A crucial step in plant breeding is the selection and combination of parents to form new crosses. Genome-based prediction guides the selection of high-performing parental lines in many crop breeding programs which ensures a high mean performance of progeny. To warrant maximum selection progress, a new cross should also provide a large progeny variance. The usefulness concept as measure of the gain that can be obtained from a specic cross accounts for variation in progeny variance. Here, it is shown that genetic gain can be considerably increased when crosses are selected based on their genomic usefulness criterion compared to selection based on mean genomic estimated breeding values. An efcient and improved method to predict the genetic variance of a cross based on Markov chain Monte Carlo samples of marker effects from a whole-genome regression model is suggested. In simulations representing selection procedures in crop breeding programs, the performance of this novel approach is compared with existing methods, like selection based on mean genomic estimated breeding values and optimal haploid values. In all cases, higher genetic gain was obtained compared with previously suggested methods. When 1% of progenies per cross were selected, the genetic gain based on the estimated usefulness criterion increased by 0.14 genetic standard deviation compared to a selection based on mean genomic estimated breeding values. Analytical derivations of the progeny genotypic variance-covariance matrix based on parental genotypes and genetic map information make simulations of progeny dispensable, and allow fast implementation in large-scale breeding programs. KEYWORDS genomic selection; Bayesian statistics; plant breeding; progeny variance; usefulness criterion I N plant breeding, superior inbred lines are developed either for direct cultivar release, as hybrid components, or as potential parents in population improvement. Generally, high yielding parental lines are crossed to secure a high mean performance of the progeny. To identify superior progeny, ensure genetic gain in the next selection cycle, and to maintain long-term selection gain, it is important that the cross also generates a high genetic variance. Following Schnell and Utz (1975), the usefulnessof a cross is dened as the trait mean of a dened upper fraction of its progeny, and can be derived as the expected cross mean plus the expected selection gain as a function of the selection intensity, square-root of the trait heritability, and the genetic standard deviation of the cross. With decreasing genotyping costs, selection intensity can be increased by the use of genome-based prediction methods, and, consequently, the importance of considering the prog- eny variance when deciding about future crosses increases (Zhong and Jannink 2007). Several endeavors have been made in the past to predict the progeny variance. Earlier attempts used the phenotypic distance (Utz et al. 2001), and since the availability of markers the molecular distance of parental lines, to predict progeny variance but both with limited success (Bohn et al. 1999; Hung et al. 2012). Recently, the potential of genomic selection has been investigated for many species, and in major crops such as maize and wheat it has been fully integrated in commercial breeding programs. The possibility to get dense marker genotypes allows an in- tegration of genomic prediction in many steps of line and Copyright © 2017 by the Genetics Society of America doi: https://doi.org/10.1534/genetics.117.300403 Manuscript received June 30, 2017; accepted for publication October 10, 2017 Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10. 1534/genetics.117.300403/-/DC1. 1 Corresponding author: Technical University of Munich, TUM School of Life Sciences Weihenstephan, Plant Breeding, Liesel-Beckmann-Strasse 2, 85354 Freising, Germany. E-mail: [email protected] Genetics, Vol. 207, 16511661 December 2017 1651

Upload: others

Post on 28-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genetic Gain Increases by Applying the Usefulness ...BÞ; Bernardo (2014) and Mohammadi et al. (2015) suggested to simulate progenies in silico using the parental genotypes, and a

| INVESTIGATION

Genetic Gain Increases by Applying the UsefulnessCriterion with Improved Variance Prediction in

Selection of CrossesChristina Lehermeier,*,1 Simon Teyssèdre,† and Chris-Carolin Schön*

*Plant Breeding, Technical University of Munich, 85354 Freising, Germany and †RAGT 2n, Genetics & Analytics Unit, 12510Druelle, France

ORCID ID: 0000-0001-7724-0887 (C.L.)

ABSTRACT A crucial step in plant breeding is the selection and combination of parents to form new crosses. Genome-based predictionguides the selection of high-performing parental lines in many crop breeding programs which ensures a high mean performance ofprogeny. To warrant maximum selection progress, a new cross should also provide a large progeny variance. The usefulness concept asmeasure of the gain that can be obtained from a specific cross accounts for variation in progeny variance. Here, it is shown that geneticgain can be considerably increased when crosses are selected based on their genomic usefulness criterion compared to selection basedon mean genomic estimated breeding values. An efficient and improved method to predict the genetic variance of a cross based onMarkov chain Monte Carlo samples of marker effects from a whole-genome regression model is suggested. In simulations representingselection procedures in crop breeding programs, the performance of this novel approach is compared with existing methods, likeselection based on mean genomic estimated breeding values and optimal haploid values. In all cases, higher genetic gain was obtainedcompared with previously suggested methods. When 1% of progenies per cross were selected, the genetic gain based on theestimated usefulness criterion increased by 0.14 genetic standard deviation compared to a selection based on mean genomicestimated breeding values. Analytical derivations of the progeny genotypic variance-covariance matrix based on parental genotypesand genetic map information make simulations of progeny dispensable, and allow fast implementation in large-scale breedingprograms.

KEYWORDS genomic selection; Bayesian statistics; plant breeding; progeny variance; usefulness criterion

INplant breeding, superior inbred lines are developed eitherfor direct cultivar release, as hybrid components, or as

potential parents in population improvement. Generally, highyielding parental lines are crossed to secure a high meanperformance of the progeny. To identify superior progeny,ensuregenetic gain in thenext selection cycle, and tomaintainlong-term selection gain, it is important that the cross alsogenerates a high genetic variance. Following Schnell and Utz(1975), the “usefulness” of a cross is defined as the trait meanof a defined upper fraction of its progeny, and can be derivedas the expected cross mean plus the expected selection gain

as a function of the selection intensity, square-root of the traitheritability, and the genetic standard deviation of the cross.With decreasing genotyping costs, selection intensity can beincreased by the use of genome-based prediction methods,and, consequently, the importance of considering the prog-eny variance when deciding about future crosses increases(Zhong and Jannink 2007). Several endeavors have beenmade in the past to predict the progeny variance. Earlierattempts used the phenotypic distance (Utz et al. 2001),and since the availability of markers the molecular distanceof parental lines, to predict progeny variance but both withlimited success (Bohn et al. 1999; Hung et al. 2012). Recently,the potential of genomic selection has been investigated formany species, and in major crops such as maize and wheat ithas been fully integrated in commercial breeding programs.The possibility to get dense marker genotypes allows an in-tegration of genomic prediction in many steps of line and

Copyright © 2017 by the Genetics Society of Americadoi: https://doi.org/10.1534/genetics.117.300403Manuscript received June 30, 2017; accepted for publication October 10, 2017Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.117.300403/-/DC1.1Corresponding author: Technical University of Munich, TUM School of LifeSciences Weihenstephan, Plant Breeding, Liesel-Beckmann-Strasse 2, 85354Freising, Germany. E-mail: [email protected]

Genetics, Vol. 207, 1651–1661 December 2017 1651

Page 2: Genetic Gain Increases by Applying the Usefulness ...BÞ; Bernardo (2014) and Mohammadi et al. (2015) suggested to simulate progenies in silico using the parental genotypes, and a

hybrid improvement programs (Heslot et al. 2015). This hasalso led to the suggestion to use genomic estimated breedingvalues (GEBVs) to predict progeny variance (Endelman2011; Bernardo 2014; Mohammadi et al. 2015). An R pack-age developed by Mohammadi et al. (2015) predicts theprogeny variance by using an appropriate training populationof genotyped and phenotyped inbred lines andmarker effectsestimated with a whole-genome regression model like ridge-regression best linear unbiasedprediction (RR-BLUP,Meuwissenet al. 2001). Subsequently, progenies from two genotyped paren-tal lines are simulated in silico using genetic map information. Ina third step, GEBVs of the simulated progenies are predictedusing marker effect estimates from the training population.The progeny variance is finally estimated as sample varianceof the GEBVs. Concerns about using the sample variance ofGEBVs as estimate of the genetic variance were raised asGEBVs are shrunken toward zero, and thus the approach un-derestimates the true genetic variance (Lian et al. 2015).Lehermeier et al. (2017) showed that a fully Bayesian estimateas proposed by Sorensen et al. (2001) improves the estimationof the genetic variance explained by markers in a given pop-ulation compared to estimation based on RR-BLUP variancecomponents by taking linkage disequilibrium (LD) betweenquantitative trait loci (QTL) into account. Here, we suggestto integrate this fully Bayesian estimate of the genetic varianceinto the usefulness criterion (UC).

Following a different rationale than the UC, Daetwyler et al.(2015) proposed the concept of selecting heterozygous linesbased on their optimal haploid value (OHV). The goal ofOHV isto predict the best fully homozygous line that can be producedfrom a heterozygous line or a cross. The latter authors showedthat selection based on OHV increases long-term genetic gaincompared to standard genomic selection based on GEBVs.

We hypothesize that an increase in genetic gain can beobtained when crosses are selected based on their estimatedUC compared to their mean GEBV or their OHV. In addition,geneticvarianceprediction isassumedtobemoreaccuratewithafully Bayesian estimate based on Markov chain Monte Carlo(MCMC)samplescomparedtothesamplevarianceoftheGEBVs.We investigate our hypotheses in simulation studies based ongenotypic maize data under varying selection intensities, traitheritabilities, training population sizes, andmodel complexities.We show that the genetic variance of progenies from a cross canbe derived analytically from the parental genotypes and geneticmap information without the need for in silico simulations. Weprovide formulas for calculating the expected genetic variancefor a given type of population to be created from a biparentalcross taking into account the expected frequency of recombi-nants under different levels of inbreeding.

Materials and Methods

Derivation of genetic variance among progenies

In this section and supporting SupplementalMaterial, File S1,we show how to derive the genetic variance of a cross under

the assumption of biallelic QTL, known homozygous parentalgenotypes at QTL, known QTL allele substitution effects,known recombination frequencies between QTL, and ab-sence of dominance and epistasis. We first concentrate ondoubled haploid (DH) lines derived from the F1 generationof a biparental cross, and then extend formulas to generalforms holding also for DH lines generated from higher selfinggenerations thanF1and to thegenotypic varianceof recombinantinbred lines (RILs).

Two fully homozygous parental lines, PA and PB, are as-sumed with known QTL genotypes xPA and xPB; each a vectorof length NQTL counting the number of favorable QTL allelesat each of NQTL QTL. We define the NQTL-dimensional vectorof known allele substitution effects as a: The breeding valuesof PA and PB are then x9PAa and x9PBa: Further, DH progeniesgenerated from the F1 generation of a PA 3 PB cross are con-sidered. The genotypes of the progenies can be defined asXPA 3 PB; a matrix with progeny as rows and the NQTL QTLgenotypes as columns. The mean breeding value of the prog-enies can be derived as the mean of their parental lines’breeding values:

mPA 3 PB¼ 1

2

�x9PA

aþ x9PBa�: (1)

The progeny variance can be derived as:

s2PA 3 PB

¼ varðXPA 3 PBaÞ ¼ a9varðXPA 3 PBÞa: (2)

To obtain varðXPA 3 PBÞ; Bernardo (2014) and Mohammadiet al. (2015) suggested to simulate progenies in silico usingthe parental genotypes, and a genetic map in order to obtainXPA 3 PB: If the approach is to be implemented in breedingprograms to test the variance of a high number of potentialcrosses, the simulation of progeny becomes a computation-ally intensive task, as, per cross, a minimum of several100 progenies need to be simulated for accurate varianceestimation. In the following, we show how varðXPA 3 PBÞ canbe derived from the parental genotypes and the recombina-tion frequencies without the need to simulate XPA 3PB: Forfully inbred lines, the following holds:

varðXPA 3 PBÞ ¼0@ 4p1ð12p1Þ ⋯ 4D1NQTL

⋮ ⋱ ⋮4DNQTL1 ⋯ 4pNQTL

�12pNQTL

�1A;

(3)

where the j-th diagonal entry corresponds to the variance atthe j-th QTL locus varðXPA 3 PBÞjj ¼ ð1þ FÞ2pjð12pjÞ withinbreeding coefficient F ¼ 1 for fully inbred DH lines, andpj the allele frequency in the parental lines, which, by expec-tation, also holds for the progenies (pj 2 f0; 0:5; 1g). For DHlines, the variance at locus j is then either varðXPA 3 PBÞjj ¼ 1 ifthe parental alleles differ at this locus, or 0 if both parentallines have the same allele, and progenies will not showsegregation at this locus. The off-diagonal elements ofvarðXPA 3 PBÞ show the disequilibrium covariances between

1652 C. Lehermeier, S. Teyssèdre, and C.-C. Schön

Page 3: Genetic Gain Increases by Applying the Usefulness ...BÞ; Bernardo (2014) and Mohammadi et al. (2015) suggested to simulate progenies in silico using the parental genotypes, and a

two loci: varðXPA 3 PBÞjl ¼ 4Djl ¼ 4ðpjl 2pjplÞ; where pjl de-notes the haplotype frequency. The disequilibrium parameterDjl between loci j and l can be derived from the disequilibriumparameter among both parental lines D*

jl and the expectedfrequency of recombinants between both loci cð1Þjl as:Djl ¼ ð12 2cð1Þjl ÞD*

jl: Depending on the parental haplotypes,D*jl is either 0 if both parental lines show the same alleles at

one or both loci, or 0.25 or 20.25 depending on the linkagephase of the parents. If DH lines are generated from a latergeneration than F1, it needs to be considered that theexpected frequency of recombinants increases with increas-ing number of meioses. Depending on the generation kwhenDH lines are generated (k ¼ 1 for DH from F1), the expectedfrequency of recombinants increases to

cðkÞjl ¼2cð1Þjl

1þ 2cð1Þjl

�120:5k

�122cð1Þjl

�k�: (4)

The general formula forDH lines generated fromgeneration kis then:

varðXPA 3 PBÞDHðkÞjl ¼ 4D*jl

�12 2cðkÞjl

�: (5)

We give a full derivation for arriving at varðXPA 3 PBÞ; and anadjustment that also holds for RILs after different numbers ofselfing generations in File S1. Table 1 summarizes how thegenotypic variance-covariance between two loci j and l can bederived for different populations, based on the LD parameterD*jl in the parental gametes and the expected frequency of

recombinants cð1Þjl between both loci in the first generation.

Estimation of progeny variance based on whole-genome regression

For estimating the variance of progeny, allele substitutioneffects a need to be known. QTL and their effects cannotbe observed, but, with high marker density, strong LD be-tween markers and QTL can be exploited allowing to replaceQTL with marker genotypes with only limited loss of infor-mation. We then can define the marker genotypes MPA 3 PBand their NSNP-dimensional vector of allele substitutioneffects b accordingly. Following Bernardo (2014) andMohammadi et al. (2015), we estimate marker effects in aphenotyped and genotyped training populationwith the linearregression model:

yTP ¼ 1NTP b0 þMTPbþ e; (6)

where yTP is the vector of phenotypes of a training popula-tion, b0 is an intercept,MTP is amatrix of marker genotypes ofthe individuals in the training population, and b and e are thevectors of marker effects and residuals. We estimate markereffects in a fully Bayesian way, assigning independentand identical Gaussian prior distributions to the markereffects b � Nð0; Is2

bÞ: Residuals are also assumed to followindependent and identical Gaussian distributions:e � Nð0; Is2

e Þ: Scaled inverse-x2 prior distributions areassigned to the residual and marker effect variance s2

e ands2b: Samples from the posterior distribution are created using

a MCMC algorithm as implemented in the R package BGLR(Pérez and de los Campos 2014). Hyperparameters for thescaled inverse-x2 prior distributions were chosen accordingto default rules in BGLR corresponding to relatively uninforma-tive priors, and an a priori assumption of 50% of the phenotypicvariance explained by markers. We used 20,000 iterations,where the first 5000 samples were discarded as burn-in. Fromthe postburn-in samples, we saved only every fifth sample forposterior inference, corresponding to L ¼ 3000 samples.

To estimate progeny variance based on the whole-genomeregression model given in (6), two alternative methods wereused. The first method—denoted as the “variance of posteriormeans” (VPM)—corresponds to calculating the sample vari-ance of the GEBVs g ¼ MPA 3 PB

bb as described by Bernardo(2014) and Mohammadi et al. (2015), with

s2½VPM�PA 3 PB

¼ varðMPA 3 PBbbÞ ¼ bb9varðMPA 3 PBÞbb; (7)

where bb is the vector of posterior means of marker effectsobtained from model (6) using a training population.

Following Sorensen et al. (2001), the progeny variancecan also be estimated by constructing a posterior distributioncalculating in each MCMC sample the progeny variance as:

var�gðsÞ

�¼ var

�MPA 3 PBb

ðsÞ�¼ bðsÞ9varðMPA 3 PBÞbðsÞ; (8)

where bðsÞ is the s-th thinned postburn-in sample from theMCMC algorithm. By using the posterior mean from all sam-ples, we obtain the estimate according to method M2 ofLehermeier et al. (2017), which we denote here as the “pos-terior mean variance” (PMV):

Table 1 Overview of genotypic covariance between loci j and l for different populations derived from two parental lines based on LDparameter in parental lines D*

jl and expected frequency of recombinants in generation 1 cð1Þjl

Population Genotypic Variance-Covariance varðXPA 3PBÞjlDH F1 (k ¼ 1)a 4D*

jl

�12 2cð1Þjl

�DH generation k 4D*

jl

�Pkr¼1

�0:5

�122cð1Þjl

��rþ�0:5

�122cð1Þjl

��k�RILs generation kb 4D*

jl

�Pkr¼1

�0:5

�122cð1Þjl

��r�DH and RILs generation N 4D*

jl

�12 4cð1Þjl =

�1þ 2cð1Þjl

��a DH lines derived from F1 generation.b RILs after k2 1 selfing generations (k ¼ 1 equals F2 population).

Variance Prediction 1653

Page 4: Genetic Gain Increases by Applying the Usefulness ...BÞ; Bernardo (2014) and Mohammadi et al. (2015) suggested to simulate progenies in silico using the parental genotypes, and a

s2½PMV�PA 3 PB ¼

1L

XLs¼1

bðsÞ9varðMPA 3 PBÞbðsÞ: (9)

Equation (9) can also be formulated as:

E�var

�MPA 3 PBb

�jyTP

�¼

E�var

�MPA 3 PBbjMPA 3 PB; yTP

��þ

var�E�MPA 3 PBbjMPA 3 PB; yTP

��¼

E�M9

PA 3 PBvar�bjyTP

�MPA 3 PB

�þ var

�MPA 3 PB

bb�¼ trace

�var

�MPA 3 PB

�var

�bjyTP

��þ b9var

�MPA 3 PB

�bb;(10)

where varðbjyTPÞ is the posterior variance-covariance matrixof marker effects, which is estimated based on LMCMC samples:varðbjyTPÞ ¼ 1=L � B9B; with B the L3NSNP-dimensionalmatrix of L samples of the NSNP marker effects centered bytheir posterior means. The second part of Equation (10)corresponds to s2½VPM�

PA 3 PB (7), which can be interpreted as thevariation among GEBVs. The first part of Equation (10) canbe interpreted as variation of an individual’s GEBV originat-ing from variation of marker effect estimates. If there is nouncertainty in marker effect estimates, varðbjyTPÞ ¼ 0 andthe first part is zero. In this case, estimates from PMV andVPM are equal, and should approach the true geneticvariance.

Usefulness and optimal haploid value of a cross

As the interest in breeding is typically both increasing themean of a population and identifying superior lines, crossescanbeselectedbasedon theirUC(Schnell andUtz1975)or theirsuperior progeny value as defined by Zhong and Jannink(2007), which is the mean of the upper fraction of the selectedlines. For a normally distributed trait, themean of the genotypicvalues of selected progenies from a cross is:

UC ¼ mþ isg; (11)

where m is the genotypic mean of the cross, i the selectionintensity (Falconer and Mackay 1996), and sg the geneticstandard deviation. Under absence of dominance and epis-tasis as assumed here, the genotypic value of a line equalsits breeding value. We predicted the usefulness of crossesby estimating m from the mean parental GEBVs and sg

using the two alternative variance estimation methodsVPM and PMV as described in the previous section. Thegenotypic variance-covariance matrix varðMPA 3 PBÞ enter-ing VPM and PMV was derived from parental genotypesand genetic map information using the formula given inTable 1, line 1.

For comparison, we also investigated the concept of opti-mal haploid value selection suggested by Daetwyler et al.(2015) to identify superior crosses. We predicted the optimalhaploid value that can be generated from a cross by:

dOHV ¼ 2XNSegments

w¼1

max�Hwbbw�

; (12)

where NSegments is the number of segments into which thegenome is split to calculate the OHV, Hw defines a matrixwith number of columns equal to the number of marker lociin segmentw, and rows containing the four haplotype scores(0 or 1) of the two parental lines; and bbw defines the vectorcontaining the marker effects of segment w estimated inmodel (6) using a training population. Note, for fully homo-zygous parental lines, Hw can be reduced to two rows byconsidering one gamete each. In our study, we split eachchromosome into three segments corresponding to the de-fault value as chosen by Daetwyler et al. (2015).

Simulations

Our simulation study consists of two main parts. In the firstpart, we investigate the two variance estimation methods(VPM and PMV), and in the second, we assess the geneticgain from selection based on UC and OHV compared toselection based on mean GEBVs. For both, we simulated atraining population based on genotypic data from 10 multi-parental populationsofmaizeDH lines fromthedentheteroticgroup, which were published by Bauer et al. (2013). The841 DH lines were genotyped with the Illumina MaizeSNP50BeadChip. After quality control and imputation of missingvalues as described by Lehermeier et al. (2014), 32,801high-quality polymorphic SNPs were available and formedthe genotypic data of our training population. A genetic con-sensus map of the 10 biparental families was constructedby Giraud et al. (2014), and is available at Maize GDB (http://maizegdb.org/cgi-bin/displayrefrecord.cgi?id=9024747).Basedon this genetic map information, recombination frequencies be-tween marker pairs were derived as cð1Þ ¼ 0:5ð12 expð22xÞÞ;with x being the map distance between two marker loci inMorgan (Haldane 1919).

Simulation part 1—investigation of variance predictionmethods: We randomly sampled NQTL ¼ 300 loci from themarker data of the training population to be QTL, and ran-domly sampled QTL effectsa from independent and identicalnormal distributions with mean zero. True genotypic effectsof the training population were then defined as gTP ¼ XTPa;

with XTP the QTL genotypes of the training population. Toobtain phenotypic values, random error terms were sampledfrom a normal distribution with mean zero and varianceequal to varðgTPÞð12 h2Þ=h2 to obtain a heritability of h2:Heritability values varied from 0.2 to 1, in steps of 0.2. Toinvestigate different training population sizes, subsets vary-ing in size from 100 to 600, in steps of 100, were sampledrandomly from the full training population. An only-QTL sce-nario and an only-marker scenario were considered. In theonly-QTL scenario, we exclusively included the 300 markersassigned a nonzero QTL effect in the whole-genome regres-sion model. This can be considered as ideal situation. In the

1654 C. Lehermeier, S. Teyssèdre, and C.-C. Schön

Page 5: Genetic Gain Increases by Applying the Usefulness ...BÞ; Bernardo (2014) and Mohammadi et al. (2015) suggested to simulate progenies in silico using the parental genotypes, and a

only-marker scenario, we excluded the 300 markers assignedQTL effects, and included a random subset of 3000 remainingmarkers in the whole-genome regression model. Model pa-rameters were then estimated using the specified trainingpopulation. We generated 200 crosses among randomly sam-pled lines from the training population. In each cross, wecalculated the true genetic variance as in Equation (2) usingthe formula given in Table 1 for DH lines derived from the F1generation. Further, variances were estimated using VPMandPMV as described above with their average bias calculated astrue variance minus the estimated variance and standardizedby the true variance. The predictive correlation of the vari-ance estimates was calculated as correlation between truevariance and estimated variance among the 200 crosses.The whole simulation approach was replicated 10 timesand results of bias and predictive correlation were averagedover the 10 replications.

Simulation part 2—selection based on UC and OHV: Thesimulation part 2 was based on a training population assimulated inpart 1with trainingpopulation size ofNTP ¼ 500;NQTL ¼ 300; and two different heritability values (h2 ¼ 0:2and 0.6). Using only 3000 non-QTL markers, as in the only-marker scenario of simulation part 1, we fitted a whole-genomeregression model as described in Equation (6). From themodel, we obtained GEBVs for all lines in the training pop-ulation. Using the GEBVs, we selected the 100 best lines(showing largest GEBVs) to form parental lines for newcrosses. We calculated themean of the GEBVs for all 4950 po-tential crosses of the 100 best lines. Further, we calculatedthe Rogers’ distance based on marker data between the pa-rental lines of the 4950 crosses. To avoid crosses betweenclosely related parents, and to ensure high means of thecrosses, we selected those crosses where parental linesshowed a minimum genetic distance of 0.2, and subsequentlyselected the 150 crosseswith the highestmean parental GEBV.We used this approach of preselecting crosses on the one handto reduce computing time for the further calculations, and, onthe other hand, to best simulate a typical procedure in a breed-ing program. For comparison, we additionally show resultswhere the 150 crosses were selected by mean parental GEBValone without restriction on parental distance (minimum dis-tance of 0.0). For the 150 crosses, we calculated the mean andthe variance within each cross based on true QTL effects, andbased on estimated marker effects. For each cross, we calcu-lated the true and estimated UC with the two different vari-ance estimation approaches. In addition, theOHVof each crosswas estimated. The full simulation procedure was replicated400 times. In each replication, we selected 25 crosses basedeither only on their mean GEBV, based on their true or esti-mated UC, or based on their estimated OHV. For each of the25 crosses, we assumed a sample size of 100 progeny per cross,and, from those selected, the best lines per cross applyingdifferent selection intensities corresponding to a selection of1–100 lines per cross in steps of 1. To assess the genetic gainof the different approaches, we calculated the mean true

genotypic value of the selected lines from the different se-lection strategies. We report results for the difference ingain between selection based on estimated UC and meanGEBV, as well as between estimated OHV andmean GEBV ingenetic standard deviations of the training population. Anoverview of the selection scheme applied in simulation part2 is given in Figure 1.

Data availability

Simulationswerebasedongenotypicmaizedataavailableunderhttp://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50558.The geneticmap used can be downloaded fromhttp://maizegdb.org/cgi-bin/displayrefrecord.cgi?id=9024747.

Results

Simulation part 1

Figure 2 shows the bias and predictive correlation of VPMand PMV for prediction of the variance of new crosses in the

Figure 1 Scheme of simulation part 2.

Variance Prediction 1655

Page 6: Genetic Gain Increases by Applying the Usefulness ...BÞ; Bernardo (2014) and Mohammadi et al. (2015) suggested to simulate progenies in silico using the parental genotypes, and a

only-QTL scenario. While VPM showed an underestimationof the true genetic variance with the bias approaching zerowith increasing size of the training population (N) and her-itability, PMV showed an overestimation with low N and her-itability, and a slight underestimation with high heritabilityand low sample size. Except for a very low heritability of 0.2,PMV was considerably less biased than VPM. The correlationbetween true and estimated variance approached 1 with in-creasing N and heritability with estimationmethods VPM andPMV, with PMV yielding a higher predictive correlation in allcases. Figure 3 shows the bias and predictive correlation forthe scenario where only marker genotypes and no QTL wereincluded in the whole-genome regression model. Comparedto the only-QTL scenario, predictive correlations were re-duced with both VPM and PMV, but only with VPM the biasincreased. With heritability of 1 and training population sizeof 600, the predictive correlation was 0.91 when PMV wasused and 0.90 when VPMwas used. In all cases, PMV showedhigher predictive correlations than VPM, and the superiorityof PMV over VPM increased with decreasing heritability.

Simulation part 2

The efficiency of selection based on the different criteria (UC,OHV, GEBV) depends on the variation of the genetic meansand genetic standard deviations between crosses. Table 2shows the sample average and variance of the genetic mean,genetic standard deviation, and UC of the 150 crosses, pre-selected by parental GEBVs based on simulated heritabilities

of 0.2 and 0.6, and with a minimum distance between par-ents of 0.2 and 0.0. The average genetic mean, and, conse-quently, the average UC increased with higher heritability asthe preselection by GEBVs selected high performing linesmore reliably. The average within-cross genetic standarddeviation was only marginally affected by the heritability.Similarly, the sample variance among-cross genetic meansand the UC were smaller with higher heritability, while thesample variance of genetic standard deviations remainedconstant. Without restriction on parental distance, the aver-age genetic mean increased, but the average genetic standarddeviation and UC decreased. The sample variance of the ge-netic standard deviations and UC increased compared to thefraction of crosses preselected based on a minimal parentaldistance of 0.2.

Figure 4 shows the additional genetic gain that can beobtained when selection of crosses is based on the predictedUC or OHV compared to selection based on themean GEBV ofthe parents as a function of the selection intensity withincrosses for heritabilities of 0.2 (Figure 4A) and 0.6 (Figure4B) when crosses were restricted to a minimum parentaldistance of 0.2. The additional gain of selection based onthe UC increased with higher selection intensities. For allheritabilities and selection intensities, predicting UC basedon PMV yielded the highest genetic gain.When selecting only1% of lines per cross, and with heritability 0.6, the geneticgain increased up to 0:14sg:With heritability of 0.6, estimat-ing the variance with method PMV yielded up to 0:01sg more

Figure 2 Bias and predictive correlations of the ge-netic variance estimates for only-QTL scenario. Esti-mates for 200 randomly generated crosses wereobtained with methods VPM and PMV for differenth2and training population sizes N. Simulation sce-nario with 300 QTL genotypes coded in the markermatrix.

1656 C. Lehermeier, S. Teyssèdre, and C.-C. Schön

Page 7: Genetic Gain Increases by Applying the Usefulness ...BÞ; Bernardo (2014) and Mohammadi et al. (2015) suggested to simulate progenies in silico using the parental genotypes, and a

gain compared to using the VPM as estimate. With low her-itability of 0.2, the superiority of PMV over VPM increased upto 0:03sg: Selecting crosses based on their OHV led to anincrease in genetic gain compared to selection based on themean GEBV for high selection intensity. When .10% proge-nies per cross were selected, selection based on OHV wasinferior to selection based on mean GEBV. In general, selec-tion based on estimated UC was greatly superior to selectionbased on estimated OHV. With training population heritabil-ity of 0.6, the additional genetic gain of selecting crossesbased on the UC compared to the mean GEBV was higherthan for a training population heritability of 0.2. The addi-tional genetic gain of selection based on UC compared toselection based on mean GEBVs increased considerablywhen crosses were not restricted to a minimum parentaldistance of 0.2, and reached a maximum increase of 0:20sg

for heritability 0.2 (Figure 4C) and 0.24 for heritability 0.6(Figure 4D).

Figure 5 shows the genetic gain when true UC and trueOHVwere used for selection of crosses compared to selectionbased on true mean genotypic values of parental lines con-sidering selection among 150 potential crosses preselectedby minimum genetic distance of 0.2 and heritability of 0.6(comparable to Figure 4B). Here, QTL effects a were as-sumed to be known to calculate the UC and OHV. With trueeffects the genetic gain from selection with UC increased upto 0:18sg: Similarly as under the use of estimated markereffects, selection based on true OHV resulted in reduced gain

compared to selection based on true UC and was only supe-rior to selection based on mean genotypic values whenkeeping,10% of progenies per cross. In addition to selectionbased on true UC, Figure 5 shows selection based on artifi-cially biased UC. For this the true genetic variance enteringthe UC was either divided by two to simulate a varianceestimate that is biased, but has a predictive correlation of 1;or a random error was added to the true genetic variance tosimulate a variance estimate that is unbiased, but shows pre-dictive correlation of 0.5. A biased genetic variance in theUC led to a small decrease in genetic gain of 0:01sg; whilea genetic variance with a predictive correlation of 0.5 de-creased the additional genetic gain by around one half.

Discussion

Here, we showed that increased genetic gain can be obtainedwhen selection decisions are based on the estimated progenyvariance in addition to the estimated mean of a cross. For atypical scenario of a selected proportion of 10% per crossand a heritability of 0.6, selection gain increased by 0:065sg

when the estimated UC with PMV was used for selectiondecisions. We assumed 100 derived progenies per selectedcross. Selection intensities within crosses will be smaller, withconsiderably fewer derived progenies per cross, and, conse-quently, the additional genetic gain that can be obtained fromselection based on UC will decrease. However, with decreas-ing genotyping costs and the implementation of genomic

Figure 3 Bias and predictive correlations of the ge-netic variance estimates for only-marker scenario.Estimates for 200 randomly generated crosses wereobtained with methods VPM and PMV for differenth2and training population sizes N. Simulation sce-nario with 3000 non-QTL marker genotypes codedin the marker matrix.

Variance Prediction 1657

Page 8: Genetic Gain Increases by Applying the Usefulness ...BÞ; Bernardo (2014) and Mohammadi et al. (2015) suggested to simulate progenies in silico using the parental genotypes, and a

selection, selection intensities are likely to increase, whichwill make selection of crosses based on UC more advanta-geous. Selection gain also depends on the level of variationin the genetic means and standard deviations of crosses. Theadditional genetic gain of selection based on UC compared toselection based on mean GEBVs was higher for a heritabilityof 0.6 compared to 0.2, as due to a more precise preselectionof high performing parental lines based on GEBVs the varia-tion between the genetic means of the 150 potential crosseswas lower, leading to a more favorable ratio of sample vari-ance of genetic standard deviations (varðsgÞ) over means(varðmÞ). Without a restriction on parental distance, the150 potential crosses showed a larger variation in geneticstandard deviations, and, consequently, the additional genet-ic gain of selection based on the UC considerably increased.

Considering, as parental lines, RILs derived fromone cross,Zhong and Jannink (2007) concluded that the impact of theprogeny variance of crosses on the superior progeny valuedecreases rapidly with increasing number of QTL. We simu-lated 300 QTL to warrant normally distributed genotypicvalues, and did not see a decrease in gain when the numberof QTL was increased (results not shown). The reduced se-lection gain when using VPM compared to PMV as varianceestimate in the UC can mainly be explained from a lowerpredictive correlation, as a correct ranking of the crossesbased on their variances is most important (Figure 5). How-ever, also an underestimated variance with perfect predictivecorrelation slightly decreases the genetic gain, as less weightis given to the variance part in the UC and selection becomessimilar to selecting based on the mean of the crosses alone.An unbiased estimate of the genetic variance is clearly advan-tageous, as it can be directly used for selecting among differ-ent crosses based on their UC. Thus, the methods suggestedhere are clearly superior to variance prediction methodsbased on phenotypic or genetic distance between parentsthat can, at best, rank the variance of different crosses(Lian et al. 2015).

Other approaches independent of the UC have been pro-posed to guide mating decisions. Daetwyler et al. (2015)suggested selecting crosses or individuals based on their op-timal haploid value—a concept that has been proposed inanimal breeding for investigating selection limits (Cole andVanRaden 2011). Selecting crosses based on their OHV cor-responds to summing over the best haploid segments presentin both parental lines. In our simulation, selection based on

the estimated OHV gave an increase in genetic gain com-pared to selection based on mean GEBVs for high selectionintensities. This is expected as the OHV potentially identifiesthe best line that can be derived from a cross, assuming aninfinite number of progeny per cross and an infinite selectionintensity. With decreasing selection intensity, the geneticmean of the selected fraction moves away from the optimalvalue and approaches the mean GEBVs of the parents. Con-sequently, selection based on mean GEBVs was superior toselection based on OHV when .10% of lines per cross wereselected. The OHV can be adapted by varying the number ofsegments in which absence of recombination is assumed.Thus, one might argue that a small number of segments perchromosome should be chosen if selection intensity is lowand the focus is on short-term genetic gain (Goiffon et al.2017). We observed very similar genetic gain with one seg-ment per chromosome compared to using three segments forestimating the OHV (results not shown). As neither selectionintensity nor recombination frequencies directly enter intothe OHV, selection based on OHV does not take into accountthe probability of realizing a specific OHV. To alleviate thisshortcoming, Han et al. (2017) suggested the predicted crossvalue for selection, which is defined as the probability that agamete produced from a cross will only consist of desirablealleles. However, this approach does not differentiate be-tween large and small allelic effects, and, similar to otherapproaches, would rely on knowledge or precise estimationof desirable alleles for use in practice. In contrast, selectionbased on UC and PMV takes into account all available infor-mation, including LD between loci as well as uncertainty ofeffect estimates, and provides an increase in genetic gaincompared to selection based on mean GEBV for the entirerange of selection intensities.

We showed that the varianceof genotypes fromanewcrosscan be derived analytically from parental genotypes and ge-netic map information, and, thus, no in silico simulations areneeded for predicting the genetic variance of progenies underthe formulated assumptions. The theoretical derivation cor-responds to a simulation of an infinite number of progeniesper cross, and is most precise (see supporting File S1). Thesimulation of progenies can become computationally intenseif the variance of several thousands of crosses needs to bepredicted, so we consider our approach highly advantageousfor application in a breeding program. One limitation of ourapproach is that, for the derivation of the progeny variance,

Table 2 Average genetic mean (m), standard deviation (sg), and usefulness criterion (UC) as well as the sample variance of genetic means(varðmÞ), standard deviations (varðsgÞ), and usefulness criterions (varðUCÞ) among the 150 potential crosses of simulation part 2

Distancea h2  b m sg UC varðmÞ varðsgÞ varðUCÞ0.2 0.2 14:0362:02c 1:6960:19 17:4462:21 1:1460:37 0:0860:04 1:3560:5

0.6 14:8661:98 1:6960:18 18:2862:17 0:6360:21 0:0860:04 0:8860:290.0 0.2 14:1962:08 1:5460:19 17:2962:24 1:1560:42 0:1260:05 1:4360:57

0.6 15:0262:02 1:5660:18 18:1662:16 0:6460:23 0:1260:05 0:9760:34a Minimum Rogers’ distance between parents.b Simulated heritability.c Means 6 standard deviations across the 400 simulation runs.

1658 C. Lehermeier, S. Teyssèdre, and C.-C. Schön

Page 9: Genetic Gain Increases by Applying the Usefulness ...BÞ; Bernardo (2014) and Mohammadi et al. (2015) suggested to simulate progenies in silico using the parental genotypes, and a

assumptions regarding the expected recombination fre-quency need to be made. Our results are based on the as-sumption of known recombination frequencies by assumingthe given genetic map as true, and the absence of interference(Haldane 1919). In practice, precision of estimated recombi-nation frequencies might vary between species, dependingon available mapping information and the presence of inter-ference. Furthermore, recombination ratesmight vary amongcrosses (Bauer et al. 2013), which might reduce the accuracyof variance prediction, and, consequently, the superiority ofthe UC. All these limitations equally apply to both the ana-lytical derivation of progeny variance as well as in silico prog-eny simulations. The expected genetic variance depends onthe population type derived from a cross. Our results arefocused on DH lines derived from the F1 generation of a

cross, but we give comprehensive formulas also for differentgenerations of RILs and DH lines considering the respectiveexpected frequency of recombinants and different levels ofinbreeding of the derived population. The specific formulasquantify if there is a gain in genetic variance of a cross whenDH lines are derived from a later generation than F1 (Sleperand Bernardo 2016), without the need of additional time-consuming computations.

We investigated two different methods for predicting thegenetic varianceofnewlygeneratedcrosses.MethodVPM, thesample variance of the GEBVs, has been used for this andsimilar purposes by other authors (Bernardo 2014; Segelkeet al. 2014; Mohammadi et al. 2015; Tiede et al. 2015;Wittenburg et al. 2016). As VPM is known to underestimatethe true genetic variance as it is based on shrunken markereffect estimates (Cole and VanRaden 2011; Lian et al. 2015),we investigated its performance for obtaining accurate andprecise progeny variance estimates under different trainingpopulation properties. In our study, VPM largely underesti-mated the true genetic variance with incomplete LD betweenmarkers and QTL (only-marker scenario). Only under idealscenarios where markers were in perfect LD with the QTL,the training population size largely exceeded the number ofmarkers in the model, and when heritability was high, didVPM provide a nearly unbiased estimate. The underestima-tion of VPM originates from the fact that uncertainty of themarker effect estimates is not taken into account. The poste-rior mean of the genetic variance calculated from MCMCsamples (PMV) takes this uncertainty into account, and, con-sequently, provided an improved variance estimator. Asshown in Equation (10), the estimated variance obtainedfrom PMV can be split into the estimated variance obtainedfrom VPM and a part that originates from the marker effectvariances. Accordingly, it yielded consistently larger varianceestimates than VPM, and showed only a slight deviation fromthe true genetic variance for heritability values .0.2 in theonly-QTL and only-marker scenario. An overestimation wasobserved for very low heritability, which can be explained bymodel overfitting and large Monte Carlo errors due to thelow signal-to-noise ratio in the training population data. Asexpected from theoretical considerations, variance estimatesobtained from VPM and PMV converged with increasing her-itability and training population size, and both yielded a pre-dictive correlation of 1 in the more ideal scenarios (only QTLin themodel, h2 ¼ 1;NTP ¼ 600). Zhong and Jannink (2007)made a similar observation, and found that a fully Bayesiantreatment of the superior progeny value was better thanusing an approach based on the posterior means of markereffects (comparable to VPM). It has been shown that, for agiven number of markers, increasing training population sizeand heritability increases the accuracy of marker effect esti-mates (Wimmer et al. 2013). Accordingly, as variance esti-mates obtained with VPM and PMV are based on markereffect estimates, they became more accurate with increasingheritability and training population size. In agreement withthe results of the predictive correlations, using PMV always

Figure 4 Additional gain of selecting crosses based on estimated UC andOHV. Gain is given as the difference in mean genotypic values of theselected lines from selection based on UC or OHV compared to selectingbased on mean GEBV alone, standardized by the true genotypic standarddeviation sg in the training population. Different variance estimates wereused to estimate the UC (VPM and PMV). Gain is given for differentfractions of selected lines per cross. Results are shown for preselectedcrosses based on minimum Rogers’ distance between parents of 0.2 anda heritability of 0.2 (A) and 0.6 (B), as well as for crosses not preselectedby parental distance, and a heritability of 0.2 (C) and 0.6 (D).

Variance Prediction 1659

Page 10: Genetic Gain Increases by Applying the Usefulness ...BÞ; Bernardo (2014) and Mohammadi et al. (2015) suggested to simulate progenies in silico using the parental genotypes, and a

led to more genetic gain than using VPM, and this superiorityincreased with decreasing heritability.

In our study, we used anMCMCalgorithm for obtaining bothvariance estimates (VPMandPMV). For VPM,MCMCwouldnothave been necessary, and variance component estimates couldhave been obtained with restricted maximum likelihood(REML). For PMV, an estimate for the variance-covariancematrix of the marker effects is needed. A closed form estimatefor this variance-covariancematrix can be derived only under anRR-BLUP model with known variance components s2

b and s2e

and thus, given shrinkage parameter l ¼ s2e =s

2b; which is then

varðbjyTP;s2b;s

2e Þ ¼ s2

e ðM9M þ lIÞ21 M9MðM9M þ lIÞ2T :

However, when variance components are unknown, no closed

form exists and MCMC can be used to obtain an estimate.Alternatively, estimated variance components (e.g., fromREML) might be plugged in to obtain varðbjyTPÞ; but thenthe uncertainty of the variance component estimates is nottaken into account, which might underestimate varðbjyTPÞ(Sorensen et al. 2001). This approach would provide analternative if one wants to avoid using MCMC to save com-puting time. As expected, in our simulations this approachwas superior considering predictive correlation and bias com-pared toVPM, but inferior compared toPMV(results not shown).

This study is based on simulations because a large numberof crosses and a large number of progenies per cross areneeded for inference. Such large numbers of unselectedprogenies are rarely available in experimental breeding pro-grams. Further, true genetic variances are unknown in exper-imental data, and proper estimation requires replicated fieldtrials to take genotype-by-environment interaction effects,which are typically large in maize, into account (Acosta-Pechet al. 2017). Our results might represent an upper limit forthe application of selecting crosses based on UC in practicalbreeding. Nonadditive or multi-allelic QTL effects, whichmight affect the accuracy of variance prediction, were notconsidered. We conjecture that the prediction of progenyvariance is affected by nonadditive and population-specificeffects similar to the prediction of breeding values whereaccuracies of methods are very similar irrespective if nonad-ditive effects are accounted for or not. In cases where non-additive effects are considered important, the whole-genomeregression model to predict the genetic variance could bereadily extended to include epistatic (Jiang and Reif 2015)or population-specific effects (de los Campos et al. 2015;Lehermeier et al. 2015). In addition, Bayesian whole-genomeregression models, including marker-specific shrinkage priorslike Bayesian Lasso or BayesB, could be used if large QTLeffects are assumed to be segregating for the traits under study.

Predictions of the genetic variance are not only of interestfor selection of crosses in plant breeding but have also beenstudied in animal breeding formating decisions (Segelke et al.2014; Bonk et al. 2016). We concentrated here on improvinga single trait in a directional selection approach. In general,knowledge of the genetic mean and variance of normallydistributed breeding values allows estimating the probabilitythat an offspring exceeds a specific threshold, or that it iswithin a specific range. Instead of increasing the genetic var-iance for fast selection progress, in specific situations, thegoal might lie in a large mean combined with a low geneticvariance, for example, in animal breeding to obtain a homo-geneous population (Cole and VanRaden 2011; Segelke et al.2014). For such inferences and subsequent mating optimiza-tions, an unbiased and precise variance prediction as can beprovided by PMV is important. The prediction of genetic var-iance can also be extended to the prediction of genetic co-variances and correlations amongmultiple traits. To estimategenetic correlations with PMV, genetic sample correlationsamong breeding values of two traits can be calculated in eachpostburn-in MCMC sample, and, from those, posterior means

Figure 5 Additional gain of selecting crosses based on true UC, trueOHV, or biased UC. For the biased UC, either an underestimated variance(s2

g ¼ 0:5s2g) or a variance estimate with predictive correlation of

corðs2g; s

2gÞ ¼ 0:5 was considered. Gain is given in comparison to the

genetic gain of selection based on mean parental true genotypic values,standardized by the genetic standard deviation of the training population.

1660 C. Lehermeier, S. Teyssèdre, and C.-C. Schön

Page 11: Genetic Gain Increases by Applying the Usefulness ...BÞ; Bernardo (2014) and Mohammadi et al. (2015) suggested to simulate progenies in silico using the parental genotypes, and a

can be formed. Further, the single-trait whole-genome re-gression model could be extended to a multi-trait model toprofit from genetic correlations for the estimation of markereffects (Jia and Jannink 2012). We conjecture that a multi-variate extension of PMV also provides a superior predictionof genetic correlations between traits compared to VPM,which warrants further investigations. Knowledge of the geneticvariance of single traits and genetic correlations among multipletraits of future crosses allows breeders to optimize their allocationof resources. Further, by applying formulas for different genera-tions of inbreeding, how selection gain changes with additionalselfing steps can be deduced. Here, our work provides a goodstarting point for the optimization of a genome-based predictionguided breeding program.

Literature Cited

Acosta-Pech, R., J. Crossa, G. de los Campos, S. Teyssèdre, B.Claustres et al., 2017 Genomic models with genotype x envi-ronment interaction for predicting hybrid performance: an ap-plication in maize hybrids. Theor. Appl. Genet. 130: 1431.

Bauer, E., M. Falque, H. Walter, C. Bauland, C. Camisan et al.,2013 Intraspecific variation of recombination rate in maize.Genome Biol. 14: R103.

Bernardo, R., 2014 Genomewide selection of parental inbreds: classesof loci and virtual biparental populations. Crop Sci. 54: 2586–2595.

Bohn, M., H. F. Utz, and A. E. Melchinger, 1999 Genetic similar-ities among winter wheat cultivars determined on the basis ofRFLPs, AFLPs, and SSRs and their use for predicting progenyvariance. Crop Sci. 39: 228–237.

Bonk, S., M. Reichelt, F. Teuscher, D. Segelke, and N. Reinsch,2016 Mendelian sampling covariability of marker effects andgenetic values. Genet. Sel. Evol. 48: 36.

Cole, J. B., and P. M. VanRaden, 2011 Use of haplotypes to esti-mate Mendelian sampling effects and selection limits. J. Anim.Breed. Genet. 128: 446–455.

Daetwyler, H. D., M. J. Hayden, G. C. Spangenberg, and B. J. Hayes,2015 Selection on optimal haploid value increases geneticgain and preserves more genetic diversity relative to genomicselection. Genetics 200: 1341–1348.

de los Campos, G., Y. Veturi, A. I. Vazquez, C. Lehermeier, and P.Pérez-Rodríguez, 2015 Incorporating genetic heterogeneity inwhole-genome regressions using interactions. J. Agric. Biol. En-viron. Stat. 20: 467–490.

Endelman, J. B., 2011 Ridge regression and other kernels for geno-mic selection with R package rrBLUP. Plant Genome 4: 250–255.

Falconer, D. S., and T. F. C. Mackay, 1996 Introduction to Quan-titative Genetics. Longman, Essex, England.

Giraud, H., C. Lehermeier, E. Bauer, M. Falque, V. Segura et al.,2014 Linkage disequilibrium with linkage analysis of multilinecrosses reveals different multiallelic QTL for hybrid perfor-mance in the flint and dent heterotic groups of maize. Genetics198: 1717–1734.

Goiffon, M., A. Kusmec, L. Wang, G. Hu, and P. Schnable,2017 Improving response in genomic selection with a popula-tion-based Selection strategy: optimal population value selection.Genetics 206: 1675–1682.

Haldane, J., 1919 The combination of linkage values and thecalculation of distances between the loci of linked factors.J. Genet. 8: 299–309.

Han, Y., J. N. Cameron, L. Wang, and W. D. Beavis, 2017 Thepredicted cross value for genetic introgression of multiple al-leles. Genetics 205: 1409–1423.

Heslot, N., J.-L. Jannink, and M. E. Sorrells, 2015 Perspectives for geno-mic selection applications and research in plants. Crop Sci. 55: 1–12.

Hung, H.-Y., C. Browne, K. Guill, N. Coles, M. Eller et al., 2012 Therelationship between parental genetic or phenotypic divergenceand progeny variation in the maize nested association mappingpopulation. Heredity 108: 490–499.

Jia, Y., and J.-L. Jannink, 2012 Multiple-trait genomic selectionmethods increase genetic value prediction accuracy. Genetics192: 1513–1522.

Jiang, Y., and J. C. Reif, 2015 Modeling epistasis in genomic se-lection. Genetics 201: 759–768.

Lehermeier, C., N. Krӓmer, E. Bauer, C. Bauland, C. Camisan et al.,2014 Usefulness of multiparental populations of maize (Zeamays L.) for genome-based prediction. Genetics 198: 3–16.

Lehermeier, C., C.-C. Schön, and G. de los Campos, 2015 Assessmentof genetic heterogeneity in structured plant populations usingmultivariate whole-genome regression models. Genetics 201:323–337.

Lehermeier, C., G. de los Campos, V. Wimmer, and C.-C. Schön,2017 Genomic variance estimates: with or without disequilib-rium covariances? J. Anim. Breed. Genet. 134: 232–241.

Lian, L., A. Jacobson, S. Zhong, and R. Bernardo, 2015 Predictionof genetic variance in biparental maize populations: genome-wide marker effects vs. mean genetic variance in prior popula-tions. Crop Sci. 55: 1181–1188.

Meuwissen, T. H. E., B. J. Hayes, and M. E. Goddard, 2001 Predictionof total genetic value using genome-wide dense marker maps. Ge-netics 157: 1819–1829.

Mohammadi, M., T. Tiede, and K. P. Smith, 2015 PopVar: a ge-nome-wide procedure for predicting genetic variance and cor-related response in biparental breeding populations. Crop Sci.55: 2068–2077.

Pérez, P., and G. de los Campos, 2014 Genome-wide regression &prediction with the BGLR statistical package. Genetics 198:483–495.

Schnell, F. W., and H. F. Utz, 1975 F1-Leistung und Elternwahl inder Züchtung von Selbstbefruchtern, pp. 234–258 in Berichtüber die Arbeitstagung der Vereinigung Österreichischer Pflanzen-züchter. Gumpenstein, Österreich.

Segelke, D., F. Reinhardt, Z. Liu, and G. Thaller, 2014 Predictionof expected genetic variation within groups of offspring for in-novative mating schemes. Genet. Sel. Evol. 46: 42.

Sleper, J. A., and R. Bernardo, 2016 Recombination and geneticvariance among maize doubled haploids induced from F1 andF2 plants. Theor. Appl. Genet. 129: 2429–2436.

Sorensen, D., R. Fernando, and D. Gianola, 2001 Inferring thetrajectory of genetic variance in the course of artificial selection.Genet. Res. 77: 83–94.

Tiede, T., L. Kumar, M. Mohammadi, and K. P. Smith, 2015 Predictinggenetic variance in bi-parental breeding populations is more accuratewhen explicitly modeling the segregation of informative genomewidemarkers. Mol. Breed. 35: 199.

Utz, H. F., M. Bohn, and A. E. Melchinger, 2001 Predicting prog-eny means and variances of winter wheat crosses from pheno-typic values of their parents. Crop Sci. 41: 1470–1478.

Wimmer, V., C. Lehermeier, T. Albrecht, H.-J. Auinger, Y. Wanget al., 2013 Genome-wide prediction of traits with differentgenetic architecture through efficient variable selection. Genet-ics 195: 573–587.

Wittenburg, D., F. Teuscher, J. Klosa, and N. Reinsch, 2016 Covariancebetween genotypic effects and its use for genomic inference in half-sib families. G3 6: 2761–2772.

Zhong, S., and J.-L. Jannink, 2007 Using quantitative trait lociresults to discriminate among crosses on the basis of their prog-eny mean and variance. Genetics 177: 567–576.

Communicating editor: F. Eeuwijk

Variance Prediction 1661