Download - Advances in gene-based crop modeling

Gene-Based Crop Modeling

J. W. Jones, M. J. Correll, K. J. Boote, S. Gezan, and C. E. Vallejos

CIATAug 4, 2015

Source: Monica Ozores-Hampton

Crop models can be considered as non-linear functions Estimate GSPs (Genetic Coefficients), fit linear statistical

model to estimate GSPs vs. QTLs Develop new statistical linear mixed effects models of G, E, and

GxE for different processes• E.g., flowering date, node addition rate, leaf size, max number of

MS nodes, … Integrate new relationships into existing DSSAT CROPGRO-

Bean model Develop component process modules using linear or nonlinear

mixed effects models of traits vs. QTLs and environmental factors, combine them to demo modular approach

Future – compare Genomic Prediction for beans similar to Technow et al. Plos One 2015)

Discussion

Outline: Our Work in Modeling

CIATAug 4, 2015

Dynamic Crop Models

Dynamic, variables of interest change over time (state variables)

Environment also changes over time System of equations & not just a single variable to predict Variables interact, typically in highly non-linear ways, varying

over time There is not a single equation to calculate the response that one

is interested in (e.g., final yield of a crop) Final yield (and other variables) may reach their final values in

many different ways, depending on genetics and environment

CIATAug 4, 2015

General Form of a Dynamic System ModelDiscrete Time/Difference Equation

Difference equation form, when time step equals 1 (e.g., 1 day):

U1,t+1 = U1,t + g1[Ut, Xt, θ]U2,t+1 = U2,t + g2[Ut, Xt, θ].. .US,t+1 = US,t + gS[Ut, Xt, θ]

CIATAug 4, 2015

ExampleFinal yield response to all variables during a season

Y =f (X; ) θ

where X represents all explanatory variables during a season,

θ represents all parameters of the dynamic model

f represents a function (typically implicit function)

• We could write this as Y = simulated final grain biomass at harvest time, T, as affected by explanatory variables (e.g., irrigation applied during a season) and by all parameters

Dynamic System Model as a Response Model

CIATAug 4, 2015

Example of Response Simulated by Crop Model

CIATAug 4, 2015

How Simulation Computes Responses

Figure 1.3. Computer program flow diagram showing how a simulation model is used as a function such that any time a response is needed, the simulation is run to calculate state variables for every time step, but return only the value of selected state variable for the time of interest. In this case, we are interested in Y at a time t = 140.

CIATAug 4, 2015

Quantities in the model that represent variations in crop performance across cultivars or lines

GSPs are the same as “cultivar coefficients” that have been used routinely in the models contained in DSSAT

Examples• Phenology – e.g., duration to first flower under optimal conditions• Size of leaves on the main stem• Maximum rate of node appearance on the main stem under optimal conditions• Number of seeds per pod (or per ear in maize)

Must be known for each cultivar to simulate its performance

Genotype-Specific Parameters (GSPs)

CIATAug 4, 2015

Example, DSSAT CROPGRO-Bean Model using GSPs

0

500

1000

1500

2000

20 40 60 80

Lea

f, S

tem

, or

See

d M

ass

Days after Sowing

Leaf-Jatu-Rong

Leaf-Porrillo S.

Stem-Jatu-Rong

Stem-Porrillo S.

Seed-Jatu-Rong

Seed-Porrillo S.

Obs Leaf

Obs Leaf

Obs Stem

Obs Stem

Obs Seed

Obs Seed

Flw SdFlw Sd

R7R7

Figure 6. Time course of leaf, stem, and seed mass accumulation of Jatu-Rong (Andean) and Porrillo Sintetico (Meso-American) cultivars relative to time of first flower (Flw),

first seed (Sd), & beginning maturity (R7) (grown at Palmira, Colombia (data from Sexton et al., 1994, 1997).CIAT

Aug 4, 2015

Application of Crop Models

Genotypes

G, M Selection for Optimal Responses

Bean CropModel

Environment, Management Data

Sim Phenotypic Responses

Iter

ativ

e E

xplo

ratio

n

GSPs

CIATAug 4, 2015

TRIFL is a GSP in the existing bean model TRIFL is the maximum rate of node appearance on the main stem,

number per day Temperature has a major effect on how rapid new nodes appear on

the main stem The model* in the DSSAT common bean model is:

GSP Example - TRIFL

𝑁𝐴𝑅(𝑡)=𝑇𝑅𝐼𝐹𝐿 ∙∑ ( 124

)( h𝑇 ∗−𝑇𝑏𝑎𝑠𝑒 )

(𝑇𝑜𝑝𝑡1−𝑇𝑏𝑎𝑠𝑒)where

NAR(t) = rate of new node or leaf appearance on the main stem on day t, #/day,TRIFL = maximum node/main stem leaf addition rate, number per day,Tbase = base temperature, below which the rate is 0.0, 0C,Topt1 = temperature above which node addition rate remains its maximum value, 0C,Thour = hourly temperature in the field where the crop is growing, 0C, and

CIATAug 4, 2015

TRIFL is a GSP Tbase and Topt1 are not GSPs, but are species-dependent

parameters in the current bean model Also, TRIFL has been used as fixed across cultivars in the past due

to lack of information We now know that TRIFL varies significantly across lines/cultivars,

based on our NSF study What about Tbase and Topt1? Example will be given later in the week on how this new information

is affecting how we model beans

TRIFL Example (continued)

CIATAug 4, 2015

Data are needed for each cultivar or genotype In our NSF study, we had over 180 genotypes, and for each of

them, we had observations in the field at 5 locations These data were used to estimate GSPs, as will be shown later in

the workshop The basic idea is that we use the multi-location experiment

phenotypic data: • Set initial GSPs as input to the simulation, • compare simulated and observed phenotypic data, • compute a measure of how close the simulated phenotypic data are to observed• Vary the GSPs and search the range of feasible values until a criterion is met,

such as minimizing the sum of the differences (errors) squared (e.g., MSE basis) or maximizes a likelihood function

Estimating GSPs

CIATAug 4, 2015

GSP Estimation: Various Approaches, including Bayesian MCMC for Model Development, Genomic

Prediction, etc.

RILs

Error/Likelihood

Bean CropModel

Multi-Location Experiments

Phenotypic Data

QTLs(~traits)



Iter

ativ

e E

stim

atio

n

GSPs

GSP* & QTL effects

Adding Genetic Information for Application of Crop Models (Ideotype Design, Selection of G, M for E, Genomic

Prediction)

Genotypes

G, M Selection for Optimal Responses

Bean CropModel

QTLs



Iter

ativ

e E

xplo

ratio

n

GSPs

CIATAug 4, 2015

Current approaches – develop relationships between GSPs and QTLs (e.g., White and Hoogenboom, 1996, 2003; Messina et al., 2006; etc.)

Why not continue this?• Current models do not include GSPs for all processes and traits that

we now know are under genetic control (examples from this study)• May need to modify environmental effects, interactions, in the model• Current crop models are not ideally structured to make all of the

changes that are needed. • Major changes are likely needed in many places, although some

code may be reusable• Although some existing crop models are modular, new modules are

needed that are designed based on what we are now learning about genetic control of processes and so that new modules can be easily modified as more is learned, fine granularity

Need for a new gene-based model

CIATAug 4, 2015

Example Results After Incorporating* Gene-Based Component in CROPGRO-Bean

20 40 60 80 1000

2

4

6

8

10

12

14

16

18

Leaf number (Jamapa QTLs (-1) 0.3 m ro)

Leaf number (Calima QTLs (+1) 0.3 m ro)

Days after Planting

20 40 60 80 1000

1000

2000

3000

4000

5000

6000

Grain wt kg/ha (Jamapa QTLs (-1) 0.3 m ro)

Tops wt kg/ha (Jamapa QTLs (-1) 0.3 m ro)

Grain wt kg/ha (Calima QTLs (+1) 0.3 m ro)

Tops wt kg/ha (Calima QTLs (+1) 0.3 m ro)

Days after Planting

Main Stem Node Number

Biomass and Pod Mass, kg/ha

* Incorporated NAR to compute TRIFL only

CIATAug 4, 2015

Need to account for G x E x M interactions on processes Need to design for evolution as more knowledge about

genetic effects on crop components is obtained Example Gene-based Model of bean leaf area Design modules with QTL effects on CM processes Still a work in progress

New Modular Approach

CIATAug 4, 2015

Linear Mixed Effects Model for NAR(t)

Bng072 Marker for QTL found to influence NAR, + 1 for Calima and -1 for Jamapa parental linesBng083 Marker for QTL found to influence NAR, equal to + 1 for Calima and -1 for Jamapa parental lines

DL Average daylength during time when nodes were being added in genotype g at site s (h)DLmean Average daylength across sites in the experiment during node addition, h

Dim7-7 Gene or QTL found to influence NAR, equal to + 1 for Calima and -1 for Jamapa parental lines

FIN Gene or QTL found to influence NAR, equal to + 1 for Calima and -1 for Jamapa parental linesNAR(t) Node addition rate, nodes per day added to the main stem for genotype g grown at site sSRAD Average SRAD across sites in the experiments, MJ m-2 d-1

TEMP Average of daily mean temperature during the time when nodes were added, 0C

CIATAug 4, 2015

NAR vs. TemperatureParent Lines

0 5 10 15 20 25 30 350

0.1

0.2

0.3

0.4

0.5

0.6

Jamapa (-1) Calima (+1)

Temperature, C

No

de

Ad

dit

ion

Ra

te,

#/d

0 5 10 15 20 25 30 350

0.1

0.2

0.3

0.4

0.5

0.6

Jamapa (-1) Jamapa with Calima FIN

Calima (+1) Calima with Jamapa FIN

Temperature, C

No

de

Ad

dit

ion

Ra

te,

#/d

(a) (b)

CIATAug 4, 2015

Modular Approach

Example of a module: model that computes node addition rate on day t (NAR(t))

CIATAug 4, 2015

We know that temperature effects on most crop growth processes is nonlinear

Also, this linear model uses mean temperature during observation period, when we know that plants respond non-linearly to temperature and should be considered hourly

So, modules need to be dynamic and include nonlinear effects

But, is Linear Model Adequate?

CIATAug 4, 2015

Example Nonlinear Model Formulation

h𝑇 ∗={𝑇𝑏𝑎𝑠𝑒𝑖𝑓h𝑇 𝑜𝑢𝑟 𝑖𝑓𝑇𝑜𝑝𝑡1 𝑖𝑓

h𝑇 𝑜𝑢𝑟<𝑇𝑏𝑎𝑠𝑒𝑇𝑏𝑎𝑠𝑒< h𝑇 𝑜𝑢𝑟<𝑇𝑜𝑝𝑡1

𝑇𝑜𝑝𝑡1< h𝑇 𝑜𝑢𝑟

CIATAug 4, 2015

What are the GSPs in the above equation? Are they constant across environments? Does this nonlinear formulation make sense relative to physiological

process and what we know? Is it sufficiently robust? How can we determine this? Will the GSPs in this equation remain fixed across genotypes?

Environments? Management? Will “calibration” be needed after fitting these equations to field

data? If so, how will this differ from what we now do? We should formulate nonlinear models based on mechanistic

knowledge, then estimate parameters using data from genetic family across diverse environments.

What About GSPs?

CIATAug 4, 2015

Simulated Mainstem Nodes vs. Days After Planting

Area Expansion of Leaves on Main Stem

CIATAug 4, 2015

Prediction of Main Stem Leaf Area

CIATAug 4, 2015

CIATAug 4, 2015

Advances in Genomic Prediction

CIATAug 4, 2015

1550 Doubled Haploid LinesSynthetic Data Set, Maize

Champaign, IL2012, 2013

Technow et al., 2015

Model GSPs, which in turn are used in function to predict yield (highly nonlinear)

CIATAug 4, 2015

Crop Model-Based Genomic Prediction outperforms GBLUP

CIATAug 4, 2015

Crop Model-Based Genomic Prediction outperforms GBLUP

QTLs estimate Yield viaCrop Model function using GSPs

QTLs estimate Yield viaGBLUP

Yield=f(4 GSPs,Env)

Discussion

Demonstrated benefits of merging crop modeling and genetics

Various methods are reasonable Need new G,E nonlinear functions estimated using

mixed effects models, physiologically based with G and E components (management also)

Modularity is important, short and long term Paper in Special Issue Genomic Prediction with crop models likely to

perform better than other methods (GBLUP)

CIATAug 4, 2015

Discussion

CIATAug 4, 2015

Download - Advances in gene-based crop modeling

Top Related