Download - Advances in gene-based crop modeling
Gene-Based Crop Modeling
J. W. Jones, M. J. Correll, K. J. Boote, S. Gezan, and C. E. Vallejos
CIATAug 4, 2015
Source: Monica Ozores-Hampton
Crop models can be considered as non-linear functions Estimate GSPs (Genetic Coefficients), fit linear statistical
model to estimate GSPs vs. QTLs Develop new statistical linear mixed effects models of G, E, and
GxE for different processes• E.g., flowering date, node addition rate, leaf size, max number of
MS nodes, … Integrate new relationships into existing DSSAT CROPGRO-
Bean model Develop component process modules using linear or nonlinear
mixed effects models of traits vs. QTLs and environmental factors, combine them to demo modular approach
Future – compare Genomic Prediction for beans similar to Technow et al. Plos One 2015)
Discussion
Outline: Our Work in Modeling
CIATAug 4, 2015
Dynamic Crop Models
Dynamic, variables of interest change over time (state variables)
Environment also changes over time System of equations & not just a single variable to predict Variables interact, typically in highly non-linear ways, varying
over time There is not a single equation to calculate the response that one
is interested in (e.g., final yield of a crop) Final yield (and other variables) may reach their final values in
many different ways, depending on genetics and environment
CIATAug 4, 2015
General Form of a Dynamic System ModelDiscrete Time/Difference Equation
Difference equation form, when time step equals 1 (e.g., 1 day):
U1,t+1 = U1,t + g1[Ut, Xt, θ]U2,t+1 = U2,t + g2[Ut, Xt, θ].. .US,t+1 = US,t + gS[Ut, Xt, θ]
CIATAug 4, 2015
ExampleFinal yield response to all variables during a season
Y =f (X; ) θ
where X represents all explanatory variables during a season,
θ represents all parameters of the dynamic model
f represents a function (typically implicit function)
• We could write this as Y = simulated final grain biomass at harvest time, T, as affected by explanatory variables (e.g., irrigation applied during a season) and by all parameters
Dynamic System Model as a Response Model
CIATAug 4, 2015
Example of Response Simulated by Crop Model
CIATAug 4, 2015
How Simulation Computes Responses
Figure 1.3. Computer program flow diagram showing how a simulation model is used as a function such that any time a response is needed, the simulation is run to calculate state variables for every time step, but return only the value of selected state variable for the time of interest. In this case, we are interested in Y at a time t = 140.
CIATAug 4, 2015
Quantities in the model that represent variations in crop performance across cultivars or lines
GSPs are the same as “cultivar coefficients” that have been used routinely in the models contained in DSSAT
Examples• Phenology – e.g., duration to first flower under optimal conditions• Size of leaves on the main stem• Maximum rate of node appearance on the main stem under optimal conditions• Number of seeds per pod (or per ear in maize)
Must be known for each cultivar to simulate its performance
Genotype-Specific Parameters (GSPs)
CIATAug 4, 2015
Example, DSSAT CROPGRO-Bean Model using GSPs
0
500
1000
1500
2000
20 40 60 80
Lea
f, S
tem
, or
See
d M
ass
Days after Sowing
Leaf-Jatu-Rong
Leaf-Porrillo S.
Stem-Jatu-Rong
Stem-Porrillo S.
Seed-Jatu-Rong
Seed-Porrillo S.
Obs Leaf
Obs Leaf
Obs Stem
Obs Stem
Obs Seed
Obs Seed
Flw SdFlw Sd
R7R7
Figure 6. Time course of leaf, stem, and seed mass accumulation of Jatu-Rong (Andean) and Porrillo Sintetico (Meso-American) cultivars relative to time of first flower (Flw),
first seed (Sd), & beginning maturity (R7) (grown at Palmira, Colombia (data from Sexton et al., 1994, 1997).CIAT
Aug 4, 2015
Application of Crop Models
Genotypes
G, M Selection for Optimal Responses
Bean CropModel
Environment, Management Data
Sim Phenotypic Responses
Iter
ativ
e E
xplo
ratio
n
GSPs
CIATAug 4, 2015
TRIFL is a GSP in the existing bean model TRIFL is the maximum rate of node appearance on the main stem,
number per day Temperature has a major effect on how rapid new nodes appear on
the main stem The model* in the DSSAT common bean model is:
GSP Example - TRIFL
𝑁𝐴𝑅(𝑡)=𝑇𝑅𝐼𝐹𝐿 ∙∑ ( 124
)( h𝑇 ∗−𝑇𝑏𝑎𝑠𝑒 )
(𝑇𝑜𝑝𝑡1−𝑇𝑏𝑎𝑠𝑒)where
NAR(t) = rate of new node or leaf appearance on the main stem on day t, #/day,TRIFL = maximum node/main stem leaf addition rate, number per day,Tbase = base temperature, below which the rate is 0.0, 0C,Topt1 = temperature above which node addition rate remains its maximum value, 0C,Thour = hourly temperature in the field where the crop is growing, 0C, and
CIATAug 4, 2015
TRIFL is a GSP Tbase and Topt1 are not GSPs, but are species-dependent
parameters in the current bean model Also, TRIFL has been used as fixed across cultivars in the past due
to lack of information We now know that TRIFL varies significantly across lines/cultivars,
based on our NSF study What about Tbase and Topt1? Example will be given later in the week on how this new information
is affecting how we model beans
TRIFL Example (continued)
CIATAug 4, 2015
Data are needed for each cultivar or genotype In our NSF study, we had over 180 genotypes, and for each of
them, we had observations in the field at 5 locations These data were used to estimate GSPs, as will be shown later in
the workshop The basic idea is that we use the multi-location experiment
phenotypic data: • Set initial GSPs as input to the simulation, • compare simulated and observed phenotypic data, • compute a measure of how close the simulated phenotypic data are to observed• Vary the GSPs and search the range of feasible values until a criterion is met,
such as minimizing the sum of the differences (errors) squared (e.g., MSE basis) or maximizes a likelihood function
Estimating GSPs
CIATAug 4, 2015
GSP Estimation: Various Approaches, including Bayesian MCMC for Model Development, Genomic
Prediction, etc.
RILs
Error/Likelihood
Bean CropModel
Multi-Location Experiments
Phenotypic Data
QTLs(~traits)
Environment, Management Data
Sim Phenotypic Responses
Iter
ativ
e E
stim
atio
n
GSPs
GSP* & QTL effects
Adding Genetic Information for Application of Crop Models (Ideotype Design, Selection of G, M for E, Genomic
Prediction)
Genotypes
G, M Selection for Optimal Responses
Bean CropModel
QTLs
Environment, Management Data
Sim Phenotypic Responses
Iter
ativ
e E
xplo
ratio
n
GSPs
CIATAug 4, 2015
Current approaches – develop relationships between GSPs and QTLs (e.g., White and Hoogenboom, 1996, 2003; Messina et al., 2006; etc.)
Why not continue this?• Current models do not include GSPs for all processes and traits that
we now know are under genetic control (examples from this study)• May need to modify environmental effects, interactions, in the model• Current crop models are not ideally structured to make all of the
changes that are needed. • Major changes are likely needed in many places, although some
code may be reusable• Although some existing crop models are modular, new modules are
needed that are designed based on what we are now learning about genetic control of processes and so that new modules can be easily modified as more is learned, fine granularity
Need for a new gene-based model
CIATAug 4, 2015
Example Results After Incorporating* Gene-Based Component in CROPGRO-Bean
20 40 60 80 1000
2
4
6
8
10
12
14
16
18
Leaf number (Jamapa QTLs (-1) 0.3 m ro)
Leaf number (Calima QTLs (+1) 0.3 m ro)
Days after Planting
20 40 60 80 1000
1000
2000
3000
4000
5000
6000
Grain wt kg/ha (Jamapa QTLs (-1) 0.3 m ro)
Tops wt kg/ha (Jamapa QTLs (-1) 0.3 m ro)
Grain wt kg/ha (Calima QTLs (+1) 0.3 m ro)
Tops wt kg/ha (Calima QTLs (+1) 0.3 m ro)
Days after Planting
Main Stem Node Number
Biomass and Pod Mass, kg/ha
* Incorporated NAR to compute TRIFL only
CIATAug 4, 2015
Need to account for G x E x M interactions on processes Need to design for evolution as more knowledge about
genetic effects on crop components is obtained Example Gene-based Model of bean leaf area Design modules with QTL effects on CM processes Still a work in progress
New Modular Approach
CIATAug 4, 2015
Linear Mixed Effects Model for NAR(t)
Bng072 Marker for QTL found to influence NAR, + 1 for Calima and -1 for Jamapa parental linesBng083 Marker for QTL found to influence NAR, equal to + 1 for Calima and -1 for Jamapa parental lines
DL Average daylength during time when nodes were being added in genotype g at site s (h)DLmean Average daylength across sites in the experiment during node addition, h
Dim7-7 Gene or QTL found to influence NAR, equal to + 1 for Calima and -1 for Jamapa parental lines
FIN Gene or QTL found to influence NAR, equal to + 1 for Calima and -1 for Jamapa parental linesNAR(t) Node addition rate, nodes per day added to the main stem for genotype g grown at site sSRAD Average SRAD across sites in the experiments, MJ m-2 d-1
TEMP Average of daily mean temperature during the time when nodes were added, 0C
CIATAug 4, 2015
NAR vs. TemperatureParent Lines
0 5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6
Jamapa (-1) Calima (+1)
Temperature, C
No
de
Ad
dit
ion
Ra
te,
#/d
0 5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6
Jamapa (-1) Jamapa with Calima FIN
Calima (+1) Calima with Jamapa FIN
Temperature, C
No
de
Ad
dit
ion
Ra
te,
#/d
(a) (b)
CIATAug 4, 2015
Modular Approach
Example of a module: model that computes node addition rate on day t (NAR(t))
CIATAug 4, 2015
We know that temperature effects on most crop growth processes is nonlinear
Also, this linear model uses mean temperature during observation period, when we know that plants respond non-linearly to temperature and should be considered hourly
So, modules need to be dynamic and include nonlinear effects
But, is Linear Model Adequate?
CIATAug 4, 2015
Example Nonlinear Model Formulation
h𝑇 ∗={𝑇𝑏𝑎𝑠𝑒𝑖𝑓h𝑇 𝑜𝑢𝑟 𝑖𝑓𝑇𝑜𝑝𝑡1 𝑖𝑓
h𝑇 𝑜𝑢𝑟<𝑇𝑏𝑎𝑠𝑒𝑇𝑏𝑎𝑠𝑒< h𝑇 𝑜𝑢𝑟<𝑇𝑜𝑝𝑡1
𝑇𝑜𝑝𝑡1< h𝑇 𝑜𝑢𝑟
CIATAug 4, 2015
What are the GSPs in the above equation? Are they constant across environments? Does this nonlinear formulation make sense relative to physiological
process and what we know? Is it sufficiently robust? How can we determine this? Will the GSPs in this equation remain fixed across genotypes?
Environments? Management? Will “calibration” be needed after fitting these equations to field
data? If so, how will this differ from what we now do? We should formulate nonlinear models based on mechanistic
knowledge, then estimate parameters using data from genetic family across diverse environments.
What About GSPs?
CIATAug 4, 2015
Simulated Mainstem Nodes vs. Days After Planting
Area Expansion of Leaves on Main Stem
CIATAug 4, 2015
Prediction of Main Stem Leaf Area
CIATAug 4, 2015
CIATAug 4, 2015
Advances in Genomic Prediction
CIATAug 4, 2015
1550 Doubled Haploid LinesSynthetic Data Set, Maize
Champaign, IL2012, 2013
Technow et al., 2015
Model GSPs, which in turn are used in function to predict yield (highly nonlinear)
CIATAug 4, 2015
Crop Model-Based Genomic Prediction outperforms GBLUP
CIATAug 4, 2015
Crop Model-Based Genomic Prediction outperforms GBLUP
QTLs estimate Yield viaCrop Model function using GSPs
QTLs estimate Yield viaGBLUP
Yield=f(4 GSPs,Env)
Discussion
Demonstrated benefits of merging crop modeling and genetics
Various methods are reasonable Need new G,E nonlinear functions estimated using
mixed effects models, physiologically based with G and E components (management also)
Modularity is important, short and long term Paper in Special Issue Genomic Prediction with crop models likely to
perform better than other methods (GBLUP)
CIATAug 4, 2015
Discussion
CIATAug 4, 2015