genomic prediction and selection for multi-environments ...genomics.cimmyt.org/sagpdb/slides...

Post on 17-Aug-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Genomic Prediction and Selection forMulti-Environments with Big Data using the BGLR

statistical package

Biometrics and Statistics Unit, Global Maize and Wheat programs

June, 2015.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package1/26

Contents

1 BGLR

2 Prediction in multi-environments

3 Models

4 Cross validation

5 Application examples

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package2/26

BGLR

BGLR

A novel software for whole genomic regression an prediction forcontinuous, discrete traits, censored and uncensored.Suitable for big p and small n problems.Many non-parametric and parametric models implemented in aconsistent manner.Large collection of Bayesian models included:

Bayesian ridge regression.Bayesian LASSO.BayesA, BayesB, BayesC-π.Reproducing Kernel Hilbert Spaces.Reproducing Kernel Hilbert Spaces with Kernel-Averaging.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package3/26

BGLR

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package4/26

BGLR

BGLR in a nutshell

Data equation: y = η + ε where η = 1µ+∑

X jβj + ul .Piors: Different priors can be assigned to regression coefficients andrandom effects ul , which leads to different models.Model fitting using MCMC algorithms (Gibbs sampler andMetropolis-Hastings) implemented efficiently.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package5/26

Prediction in multi-environments

Prediction in multi-environments

In most agronomic traits, the effects of genes are modulated byenvironmental conditions, generating G×E.Researchers working in plant breeding have developed multiple methodsfor accounting for, and exploiting G×E in multi-environment trials.Genomic selection is gaining ground in plant breeding.Most applications so far are based on single-environment/single-traitmodels.Preliminary evidence (e.g., Burgueño et al., 2012) suggests that there isgreat scope for improving prediction accuracy using multi-environmentmodels.The ideas can be taken one step further by incorporating information onenvironmental covariates.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package6/26

Prediction in multi-environments

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package7/26

Prediction in multi-environments

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package8/26

Models

Models

Model 1 (EL, Environment + Line, no pedigree)

yij = µ+ Ei + Li + eij

Model 2 (EA, Environment + Line, with markers)

yij = µ+ Ei + gj + eij

Model 3 (Environments, Line and interactions markes and environment)

yij = µ+ Ei + gj + Egij + eij

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package9/26

Models

Assumptions

It is assumed that Ei ∼ N(0, σ2E), g ∼ N(0,σ2

gG) with G being the genomicrelationship matrix and Egij the interaction term between genotypes andenvironment. Eg ∼ N(0, (Z gGZ T

g ) · Z EZ TE), Z g connects genotypes with

phenotypes, Z E connects phenotypes with environments, and · stands forHadamart product between two matrices.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package10/26

Cross validation

Cross validation

1 CV1: Prediction of performance of newly developed lines (i.e., lines thathave not been evaluated in any field trials).

2 CV2: Prediction in incomplete field trials; here the aim was to predictperformance of lines that have been evaluated in some environments butnot in others.

See Figure in next slide.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package11/26

Cross validation

Continue...

Figure 1: Two hypothetical cross-validation schemes (CV1 and CV2) for five lines(Lines 1-5) and five environments (E1-E5), source: Jarquín et al. (2014).

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package12/26

Application examples

Example 1 Wheat dataset (Ravi, Jessica et al.)

The phenotypic information consists in grain yield for wheat in 5 megaenvironments.

Table 1. Number of lines evaluated in each environment

The problem is to predict 9, 000 unobserved individuals in all theenvironments.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package13/26

Application examples

Continue...

Table 2. Phenotypic correlations between environments.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package14/26

Application examples

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package15/26

Application examples

Continue...

In order to do model fitting we used COP and markers (GBS).1 COP: We computed a relationship matrix (A). The matrix has about

50k × 50k = 2500,000,000 entries.We used BROWSE, the program took several days to finish.We used a ‘ad-hoc’ version of the R program pedigreemm and we got thematrix in about 3 hours.

2 Markers: Information for about 21,000 individuals and 14,000 individualswas available.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package16/26

Application examples

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package17/26

Application examples

Benchmark: Predicting 2014 using previous records

Figure 2: Predictions in testing

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package18/26

Application examples

The real problem

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package19/26

Application examples

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package20/26

Application examples

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package21/26

Application examples

Example 2: Biparental Tropical maize populations(Xuecai et al.)

Genotypic and phenotypic information for about 20 biparentalpopulations.Low (about 200) and Hight density markers (about 60,000).Individuals evaluated in several environments.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package22/26

Application examples

Continue...

Figure 3: Results from CV1

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package23/26

Application examples

Continue...

Figure 4: Results from CV2

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package24/26

Application examples

Collaborators in this work

J. CrossaJuan BurgueñoG. de los CamposJessica RutoskiRavi SinghEnrique AutriqueJesee PolandJuan Carlos Alarcón

Susan DreisigakerPaulino PérezX. ZhangK. SemagnY. BeyeneR. BabuF. San VicenteM. OlsenNewman Samayoua

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package25/26

Application examples

References

Burgueño, J., G. de-los-Campos, K. Weigel, and J. Crossa. (2012).Genomic prediction of breeding values when modeling genotype ×environment interaction using pedigree and dense molecular markers.Crop Science, 43: 311-320.

Jarquín, D., J. Crossa, X. Lacaze, P. Cheyron, J. Daucourt, J. Lorgeou, F.Piraux, et al . (2014).A reaction norm model for genomic selection using high-dimensionalgenomic and environmental data.Theoretical and Applied Genetics, 127 (3): 595-607.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package26/26

top related