model calibration and uncertainty analysisu0040603/thesis/ch5_model...quality change modelling in...
TRANSCRIPT
Quality Change Modelling in Postharvest Biology and Technology
Maarten L.A.T.M. Hertog - 2004
5
Model calibration and uncertainty
analysis
5.1 Introduction
One of the dilemmas in modelling in postharvest biology and technology is that
verification and validation of these models is impossible. As natural systems are
never closed they are subject to external influences not taken into account by the
model. Unless we would have a closed system, one can only falsify a model but
never verify it (Oreskes et al., 1994). Also, if one can not falsify a model (yet)
this does not imply that the model is true. One might always run into a situation
where the model does not hold. For this reason the aim of modelling in
postharvest biology and technology should not be to develop true models but to
develop valid models; models that are consistent with our current knowledge
level and that contain no known or detectable flaws of logic.
Once the concept of the model has been identified and the mathematical
equivalent has been formulated, values have to be assigned to the model
parameters. In some cases parameters are well defined physical or mathematical
128 Quality Change Modelling in Postharvest Biology and Technology
constants (such as the gas constant R, the numbers ˊ and e, and the molar mass of
water OH2M ) or measurable quantities (such as initial fruit weight or colour,
product density, size dimensions, film permeability, fruit permeance and dry
matter content). However, in many cases the model parameters represent
properties of simplified lumped systems (rate constants of an overall enzyme
system, initial amount of substrate or of the lumped enzyme system) or derived
properties (such as the energies of activation) that can not be measured directly.
As the values for these parameters can not be obtained from tables or direct
measurements they need to be estimated based on experimental data. This is done
through an iterative process of fitting the model to experimental data, also known
as model calibration or parameter estimation.
When the developed model is relatively simple an analytical solution in the
form of an algebraic expression can often be derived. In these cases, standard
statistical packages (SAS, Genstat, SPSS, Statistica, etc.) can be used to perform
non linear regression to estimate the missing parameter values. When the model
contains parameters depending on independent model variables (such as T, RH,
2Op and2COp ) that change during the experiment the analytical solution is often
not available and the original ODE based model formulation needs to be used. To
estimate parameters using ODE based models the range of available software is
much more limited. Most of the available software is dedicated to a specific
research area, limiting the type of models that can be implemented.
MatLab (MatLab v. 6.5, 2002, The MathWorks, Inc., Natick, MA, USA)
offers the framework to allow implementation of a wide range of models and
contains all the optimisation routines required to estimate model parameters for
ODE based models. Besides implementing the model itself, additional
programming is required to organise the administration of model inputs, model
outputs and the graphical and statistical processing of the model identification
results. As part of this thesis a MatLab based generic software environment is
developed to estimate parameters of ODE based model systems.
The objective of this chapter to outline some of the main features of the
developed optimisation tool, called OptiPa, emphasising the statistical properties
of the parameter estimation and their effect on model predictions. The MatLab
interface itself is outlined in appendix A. OptiPa, a MatLab interface.
Using the different case studies from this thesis and an additional example on
potato sweetening, the aspect of classifying model parameters into generic,
cultivar specific, batch specific and fruit variable parameters using OptiPa is
5. Model calibration and uncertainty analysis 129
discussed (section 5.2). In this way OptiPa can be used to readily identify the
different sources of variation in a model, enhancing the interpretation of
experimental data and the subsequent application of the model to different
situations.
To acquire accurate estimates of the confidence intervals of the model
parameters, bootstrap techniques have been implemented. Based on these
confidence intervals, Monte Carlo simulations can be performed to study model
behaviour at different conditions taking into account the parameters’ accuracies
(section 5.3). Again this will be illustrated using tomato colour as an example.
The innovative aspect in the developed interface is the aspect of classifying
model parameters to account for the different sources of variation. Furthermore, a
new distribution function was developed to facilitate the generation of co-varying
non-Gaussian random parameter sets allowing for both skewness and kurtosis.
Parts of the material presented in this chapter were published in Hertog et al.
(1997c, 1997d, 1999b).
5.2 Variable and constant parameters
5.2.1 Classifying model parameters
In section 4.3.3.2 the data on tomato hue colour was analysed to identify the
different sources of variation in terms of which of the model parameters could be
treated as generic constants, which of them as cultivar dependent parameters and
which of them as fruit variable parameters. In chapter 4 the data on hue colour
was analysed by non linear regression in SAS, using the analytical solution from
Eq. 4.2. The exercise on classifying model parameters into generic, cultivar
specific, batch specific and fruit variable parameters can also be done with OptiPa
using the ODE model formulation from Eq. 4.1. After an initial optimisation run,
selecting all parameters to be estimated in common, one can stepwisely add or
remove parameters to be estimated either per cultivar or per individual fruit. As
the cultivar ‘Tradiro’ starts from an in average lower hue value (Fig. 4.5) a logical
first option is to select H0 to be estimated per cultivar. By doing so, separate
values are estimated for the three cultivars. Ultimately, all model parameters can
be estimated per individual tomato to see how much these parameters change in
between individual fruit. This would eventually lead to the same exercise as done
in section 4.3.3.2. This whole procedure of identifying variable and constant
130 Quality Change Modelling in Postharvest Biology and Technology
model parameters can be done through the graphical user interface of OptiPa,
without any additional programming.
What seems to be an arbitrary playing around with the model can be an
essential element in classifying model parameters to find out whether they are
generic constants or depending on factors like cultivar, batch, grower, season or
temperature. Based on the model formulation and the meaning of the parameter
within the context of the model, one can often already tell what the most likely
case is. True rate constants that are properties of an enzyme (system) should be
constant as long as the same enzyme (system) is involved. Apparent rate
constants can strongly vary as they rely on, for instance, the amount of enzyme
present. Good problem decomposition should therefore always, at least formally,
separate the true rate constant from the enzyme concentration so that effects
related to enzyme turnover can be separated from for instance a temperature
effect on the rate constant. Even when initially this might play no role, a clear
separation from the start might enhance the interpretation of unexpected effects
later on. Other parameters indicating the initial status of the fruit might be varying
from fruit to fruit or might be related to treatment effects. The ultimate example
of this is the biological age of tomato fruit determining colour at harvest (section
4.3) with the cultivar specific parameters ¤+H , refck and
ck
Ea and the generic
parameter ¤-H . In the case of spoilage of strawberry (section 3.3) all model
parameters were kept generic, except for the initial level of spoilage varying per
batch which was determining the batch specific behaviour. With the stem growth
of Belgian endive (section 3.4), the kinetic parameters were again kept in
common, with the initial stem length taken as a batch dependent variable and the
initial head mass as a variable that changed between chicory heads. With the
softening of kiwifruit (section 3.5) the kinetic parameters were kept in common
with only the initial and final firmness level made batch (in this case season)
dependent.
By classifying model parameters like this, one can identify both the most
important variable and constant parameters. If the generic model parameters
indeed prove to be constant, the models can easily be applied to new batches by
measuring only the batch dependent parameters and transferring the generic
parameters. In this way models can be reused for different batches without having
to reparameterise the whole model. In other words, the model might get a
predictive value.
5. Model calibration and uncertainty analysis 131
5.2.2 The case of potato sweetening
This section will illustrate how a descriptive model can be upgraded to a
commercial predictive model by classifying model parameters into variable and
constant parameters using an example on the sweetening of potato tubers during
storage (see Hertog et al., 1997c, 1997d, 1999b for a detailed description of
materials and methods and complete results). This section will focus on the
concept of classifying model parameters.
In potatoes, reducing sugars are involved in the non-enzymatic browning
reaction, known as the Maillard reaction (Ellis, 1959), and thus the amount of
reducing sugars (glucose and fructose) determines the processing potential of
potatoes in terms of frying colour (Burton, 1989). The amount of reducing sugars
in potatoes at the time of processing depends on the conditions during the
preceding storage period (Burton 1965, 1989).
Hertog et al. (1997c) developed a dynamic mathematical model based on a
simplification of the underlying physiological processes to describe the storage
behaviour of potato (Solanum tuberosum L.) tubers in terms of accumulation of
reducing sugars. The data necessary for calibration and validation of the model
was gathered during long term storage experiments over a wide range of storage
temperatures (2 °C - 14 °C) for several seasons and cultivars. Although the model
is based on a considerable simplification of the occurring physiological processes,
it was capable of accounting for about 95% of the observed storage behaviour,
including both cold-induced and senescent sweetening.
5.2.2.1 Modelling approach
The storage potential of a certain batch of potatoes is largely determined by the
state of maturity of the tubers at harvest (Burton, 1965; Nelson and Shaw, 1976;
Iritani and Weller, 1980; Coffin et al., 1987; Pritchard and Adam, 1992). Growth
conditions and time of harvest are the most important factors affecting this state
of maturity. In spite of the apparent differences in storage behaviour between
different batches and cultivars, the actual processes leading to accumulation of
sugars are likely to be the same. The differences in degree of accumulation of
sugars will solely be the result of the different extent in which the separate
processes contribute. The two major processes involved are cold induced
sweetening and senescent sweetening. In both cases sugars are released from a
large pool of starch which is assumed constant.
132 Quality Change Modelling in Postharvest Biology and Technology
Cold-induced sweetening. The sequence by which accumulation of reducing
sugars occurs, is mobilisation of starch, followed by an increased synthesis of
sucrose and finally hydrolysis of sucrose to glucose and fructose. The overall
pathway from starch via sucrose to reducing sugars, can be simplified into one
step: a conversion from starch to reducing sugars. During storage over several
months, the level of accumulated sugars decreases in general before the onset of
senescent sweetening (Burton, 1989). Apparently the responsible enzyme system
is susceptible to an increasing malfunctioning during prolonged storage. This is
interpreted as a slow denaturation of the responsible enzyme system.
Senescent sweetening. Sugars mobilised during senescence are released for
the benefit of development and growth of sprouts (Burton, 1989). Consequently,
senescent sweetening may be initiated soon after the break of dormancy (Burton,
1977; Hughes and Fuller, 1984). Senescent sweetening, is assumed to be induced
by a second enzyme system. Inhibition of sprout growth during senescence
stimulates the accumulation of reducing sugars (Isherwood and Burton, 1975).
Although the external features of sprouting are suppressed, the metabolic
processes are apparently not. The higher the storage temperature, the earlier
senescent sweetening starts (Barker, 1938). As starch is always abundantly
present, the initial amount of enzyme responsible for the conversion into sugar
has to be very low. To reach an enzyme activity eventually large enough to
generate senescent sweetening, the enzyme has to be formed. This increase is
modelled by an exponential formation. This is in agreement with the hypothesis
of Kumar and Knowles (1993) who suggest that increased starch hydrolysis
during senescence is the result of increasing peroxidative damage of the
amyloplast membrane resulting in increasing contact between enzymes and
substrate. This also invokes an exponential increase in enzyme activity.
During storage, sugars (S) are consumed by respiration. By simple mass
action, respiration is considered to be directly related to the amount of
accumulated sugars. As potatoes are mostly stored under well ventilated
atmospheric conditions, the amount of oxygen is considered to be constant (21
kPa) and not rate limiting.
It is thus assumed that two different enzyme systems are involved for the two
different processes. The enzyme system responsible for senescent sweetening
(Esene) is accumulating in time while the enzyme system responsible for cold
induced sweetening (Ecold) is susceptible to denaturation. This resulted in the
ODE formulation from Eq. 5.1.
5. Model calibration and uncertainty analysis 133
( )
1
2
3 4 5 2O
cold cold
sene sene
cold sene
dE dt k E
dE dt k E
dS dt k E k E Starch k S
= - Ö
= + Ö
= Ö + Ö Ö - Ö Ö
(5.1)
The initial values of Ecold and Esene were set to a relative value of 1. All process
rates are assumed to depend on temperature according Arrhenius. See Hertog et
al., 1997c, 1997d, 1999b for further details.
For the largest part of storage (6 months - 8 months) and depending on storage
temperature, the process of senescent sweetening is not relevant. Even during
longer storage periods the occurrence of senescent sweetening appears to be
unpredictable. The main quality issue during regular storage is ruled by the
process of cold induced sweetening and therefore this is what is being focused on.
5.2.2.2 Model results
Analysing data from several cultivars over different years allowing all parameters
to vary per batch (season ³ cultivar combination), showed that the seasonal effect
on the cultivars was not limited to any particular parameter. However, given
seasonal differences and different harvest dates the state of maturity at harvest is
likely to be different between batches. During the development of tubers attached
to the mother plant the activity of several enzymes related to sugar metabolism
(such as sucrose synthase, ADP-glucose pyrophosphorylase and ATP
fructokinase) change (Morrell and Ap Rees, 1986; Merlo et al., 1993).
Merlo et al. (1993) reported for most enzyme activities in developing tubers,
an optimum curve with the optimum for 8 week old plants. The observation from
Merlo et al. implies that the initial value of Ecold depends on the state of maturity
at harvest. Based on this reasoning, the combined data of all storage seasons were
re-analysed per cultivar, forcing all seasonal effects to the single model parameter
Ecold,0. The remaining kinetic parameters were estimated in common for the
successive seasons and became cultivar specific. As Ecold,0 is a relative value, and
up to now fixed at 1, some reference point had to be chosen. For this purpose the
value for the season most likely to generate the most immature tubers, was fixed
at 1 as a standardised reference for the other seasons.
The values estimated for Ecold,0 for the individual seasons appeared to depend
on the length of the period from planting to harvest (Fig. 5.1). This is in
agreement with the observations described by Merlo et al. (1993). Although the
maturity of tubers is not exclusively determined by the length of the period from
134 Quality Change Modelling in Postharvest Biology and Technology
planting to harvest, it was a dominant factor influencing the state of maturity
during the successive seasons.
Fig. 5.1 Estimations of the season-
specific parameter Ecold,0 for 'Bintje'
and 'Saturna' tubers in relation to the
length of the growing period
150 160 170 180 190 200 2100.0
0.2
0.4
0.6
0.8
1.0
1.2
: Bintje
: Saturna
(days after planting)
Eco
ld,0
harvest
5.2.2.3 Model validation
The effect of different maturity stages on the accumulation of reducing sugars
during storage of 10 different potato varieties was studied by Putz (1993). Tubers
were harvested weekly (July 7 to August 29 1989) and were stored for 16 weeks
at 4 °C. The later the tubers were harvested, the lower the initial level of reducing
sugars and the lower the subsequent maximum accumulation reached. No
seasonal variation was included in the data of Putz, so the state of maturity at
harvest was directly related to the time of harvest.
This averaged data was analysed by Hertog et al. (1997d) using the approach
described above, attributing the differences between harvest to differences in
Ecold,0. The initial amount of Ecold,0 was assumed batch specific and related to the
successive harvest dates, while the other model parameters were estimated in
common for the combined set of data. As a result, Ecold,0 decreased almost linearly
as a function of harvest time and thus of the related state of maturity (Fig. 5.2).
This analysis of an independent set of data confirmed the concept that the state of
maturity at time of harvest determines the storage behaviour through the initial
amount of enzyme (or enzyme system) present.
Hertog et al. (1997c) revisited the data from Putz (1993) this time using the
original data on the ten individual cultivars studied by Putz (1993). The model
parameter Ecold,0 gave a satisfactory explanation for the harvest effects for all ten
cultivars studied decreasing almost linearly as a function of harvest time and thus
5. Model calibration and uncertainty analysis 135
of the related state of maturity (Hertog et al., 1997c). Ecold,0 may therefore be
considered as a general maturity dependent parameter determining the storage
potential of a given cultivar.
Fig. 5.2 Estimations of the season-
specific parameter Ecold,0 for the data
of Putz (1993) in relation to time of
harvest relative to the first harvest
0 10 20 30 40 50 600.0
0.2
0.4
0.6
0.8
1.0
1.2
(days after first harvest)
Eco
ld,0
harvest
It is not very likely that Ecold,0 can ever be identified as a single specific
enzyme. Probably Ecold,0 must be seen as the reflection of a more complex part
from the metabolism. For practical purposes however, it is not necessary to
identify the model enzyme Ecold as a single enzyme as long as the model
parameter Ecold,0 can be correlated with a concrete physiological property of
maturing potato tubers.
5.2.2.4 Practical application
Hertog et al. (1999b) extended their work studying the storage behaviour of 5
batches for both ‘Bintje’ and ‘Agria’ potatoes coming from 5 subsequent
harvests, each stored at 6 different constant temperatures. In addition, accelerated
shelf life testing was done by storing from each batch a sub sample for 4 weeks at
2 °C to measure sugar accumulation. In this way the cold susceptibility of the
specific batch could be determined. Assuming the cultivar specific model
parameters are all known, only the parameter Ecold,0 had to be estimated on the
data from the accelerated shelf life test. This estimate for Ecold,0 based on the
accelerated shelf life test was compared against the estimate based on the normal
long-term storage data. For both ‘Agria’ and ‘Bintje’, the estimates for Ecold,0
were not significantly different (Table 5.1).
Now the model parameters have been classified into batch and cultivar
specific parameters the model can be used for predicting storage behaviour of
136 Quality Change Modelling in Postharvest Biology and Technology
future batches of potato. The only prerequisite is that the cultivar specific
parameters are known. For every new batch of a known cultivar, the batch
specific parameter Ecold,0 can be estimated at the start of the storage season using
an accelerated shelf life. Subsequently, the long-term storage behaviour of the
specific batch can be predicted using the available model.
Table 5.1 Estimates of Ecold,0 and their s.e. based on either the long-term storage
data or the accelerated shelf life testing.
‘Bintje’ ‘Agria’
Harvest
long-term
storage
accelerated
shelf life
long-term
storage
accelerated
shelf life
1 1 1.02 (0.05) 1 1.20 (0.09)
2 0.77 (0.02) 0.78 (0.05) 0.72 (0.02) 0.64 (0.10)
3 0.57 (0.02) 0.52 (0.05) 0.76 (0.02) 0.80 (0.09)
4 0.58 (0.02) 0.57 (0.06) 0.66 (0.02) 0.53 (0.09)
5 0.65 (0.02) 0.66 (0.05) 0.66 (0.02) 0.55 (0.09)
Based on the described quality change model the warehouse management can
be optimised taking into account the demands of the specific batch under storage
(Verdijck et al., 1999a, 1999b, 2002; Verdijck and Van Straten, 2002). This
resulted in a commercial implemented model-based-control system to
automatically manage and control the climate in potato stores taking into account
the quality of the potatoes stored. In this way the quality of raw material for the
frying industry can be maintained and controlled depending on the varying
demands of the market. This case of potato sweetening has clearly shown how the
classification of model parameters can contribute to upgrading a descriptive
model to the level of a predictive model.
5.3 Numerical approach to parameter variation
Once the model parameters have been classified into batch and cultivar specific
parameters the model can be used for prediction. An essential aspect in this is the
accuracy of the model parameter estimates. Given non linearity of the models,
using standard statistics, confidence intervals for the model parameters can only
be approximated. To acquire accurate estimates of the confidence intervals,
bootstrap techniques are required and have been implemented in OptiPa (section
5.3.1). They also allow the identification of asymmetric confidence intervals.
5. Model calibration and uncertainty analysis 137
Once accurate parameter confidence intervals have been established Monte Carlo
simulations can be performed to study model behaviour at different conditions
taking into account the parameters’ inaccuracies. This can be done either using
the parameter values estimated during the bootstrap analyses or by generating
new random sets of parameters taking into account the correlation structure and
distributions of the parameters identified in the bootstrap analyses. A technique
has been implemented to generate these random correlated sets of parameters for
a large family of normal based, skewed and peaked distributions (section 5.3.2).
5.3.1 Bootstrap resampling
The bootstrap is a resampling method for statistical inference (Efron and
Tibshirani, 1993). It is commonly used to estimate confidence intervals, but can
also be used for sensitivity studies. In practical application, the bootstrap means
using some form of resampling with replacement from the actual data to generate
a large number of bootstrap samples. The exact nature of the resampling strategy
depends on the structure of the data. For each of these samples the statistical
quantities of interest can be computed. Based on these, 95 % confidence intervals
can simply be computed by sorting the estimates in ascending order and selecting
those values that cut off the upper and lower 2.5 percentiles.
Various types of resampling methods have been designed (Efron and
Tibshirani, 1993). In the area of postharvest biology and technology, most data
consist of time series in which the data at subsequent time steps are heavily
correlated and not necessarily equally distributed. As a consequence, simple
random resampling of the data with replacement is not appropriate as this
completely removes the original correlation between subsequent observations
(Bühlemann, 1999; Härdle et al., 2003). With moving block bootstrap the data is
divided into a number of non-overlapping blocks of sequential observations and
conserves at least some of the correlation between observations within the blocks.
The bootstrap sample is constructed by randomly sampling the blocks of data and
concatenating them into time series. This technique is suitable for regular
stationary data-generating processes but again not for the type of time series
generally encountered in postharvest.
Assuming a valid model is available to describe the dependence structure of
the sequential observations from the time series, this information can be used for
the bootstrap. Using the model predictions, residuals are calculated for each of the
observations. Then a bootstrap sample of residuals is drawn with replacement
138 Quality Change Modelling in Postharvest Biology and Technology
from the observed residuals. The final bootstrap sample of the observations can
be constructed by adding the randomly sampled residuals to the predicted model
values. One of the prerequisites for this approach is that the residuals are
homoscedastic. In the case of heteroscedastic variation, like in the case of tomato
colour with large variation at t = 0 d and small variation at t = 21 d (Fig. 4.5),
resampling the residuals of the raw data would completely distort the data
structure and result in non-representative bootstrap samples. In the case of
heteroscedastic variation bootstrapping should therefore be used in combination
with a Box-Cox transformation to correct for heteroscedasticity.
This model based error resampling is the bootstrap technique implemented in
OptiPa. Once a model is implemented in OptipPa and experimental data is
available, parameters can be selected for the bootstrap analysis. After an initial
optimisation run to fit the model to the selected data, residuals are calculated.
Subsequently, by combining randomly resampled residuals with the predicted
model values, the requested number of bootstrap samples is generated and the
parameters selected for the bootstrap analysis are estimated for each of the
generated bootstrap samples keeping the remaining model parameters fixed.
Afterwards the 95 % confidence intervals of the bootstrapped model parameters
are reconstructed as outlined above.
5.3.2 Monte Carlo simulations
Monte Carlo simulation is a numerical stochastic technique used to solve
mathematical problems. A Monte Carlo simulation is based on some model
system that can be described as a function of random model parameters
characterised by their probability distribution functions. Monte Carlo simulation
simulates the model system after random sampling from these probability
distribution functions. Monte Carlo methods have been used since the late 1940’s,
but only since the availability of large computational power has the technique
gained the status of a numerical method capable of addressing large complex
applications. The technique is useful to obtain numerical solutions to problems
which are too complicated to solve analytically.
So, each of the Monte Carlo model runs is based on a different set of random
model parameter values. Either model parameter combinations coming from the
previously generated bootstrap data sets can be used or new sets of randomly
generated model parameters taking into account the covariance structure and
probability of the model parameters as identified before. Gaussian random
5. Model calibration and uncertainty analysis 139
parameter sets can be easily generated using the covariance decomposition
algorithm. Given a covariance matrix V and a vector y containing the average
values of the Gaussian model parameters, the Cholesky decomposition can be
used to determine the Cholesky factor L and its transform LT so that: V = LÖLT
with L a lower triangular matrix. After generating a vector g containing standard
Gaussian random numbers (µ = 0, ů = 1) the required covariance structure can be
introduced by multiplying this vector with the Cholesky factor L. Finally, by
adding the vector y to correct for non-zero averaged Gaussian parameters, the
final vector y is obtained, containing the required set of co-varying Gaussian
random parameters (y = y + LÖg).
However, this technique is only applicable to generate co-varying Gaussian
random parameter sets. As could already be seen from the previous section,
model parameters are often not normally distributed and can either show
skewness or kurtosis. The only way to deal with this is to find a transformation of
the standard normal distribution matching the observed parameter distributions to
reshape the observed parameter distributions into standard normal distributions. If
such a transformation is available the covariance decomposition algorithm can be
applied in the Gaussian normal parameter space with the resulting parameter
values being back-transformed to the original non-normal parameter space.
The algorithm to do so is outlined in Fig. 5.3 and can be summarised as
follows:
× Fit the appropriate transformed normal density function to each of the
observed parameter distributions coming from the bootstrap (A)
× Based on the original bootstrap data compile the cumulative distribution
function (B)
× Generate a standard normal cumulative distribution function (C)
× Using the inverse transform method transform the original bootstrap data into
their corresponding values in the normal parameter space (D)
× Calculate the covariance matrix V for the normal transformed bootstrap data
and calculate the Cholesky factor L
× Using the Cholesky factor L generate a set of co-varying Gaussian random
parameters (E)
× Based on the random samples generate cumulative distribution functions (F)
140 Quality Change Modelling in Postharvest Biology and Technology
D.
original
bootstrap
sample
B.A.
N(0,1)
E.
C.
F. H.
new
random
sample
G.
Fig. 5.3 Algorithm to generate co-varying random parameter sets given the
appropriate transformed normal density function to fit to the original bootstrap
data. See text for explanation.
× Generate for each of the observed parameter distribution coming from the
bootstrap, the cumulative transformed normal distribution function based on
the fitted transformed normal density function (G)
× Using the inverse transform method transform random Gaussian samples into
their corresponding values in the original non-normal parameter space (H)
The inverse transform method used to go back and forwards between the
unknown parameter distribution and the normal parameter space (step D and step
5. Model calibration and uncertainty analysis 141
H in Fig. 5.3) is generally used to transform a uniform deviate U into a random
variable X with cumulative probability FX(x) following X = FX-1(U) (Rubinstein,
1981). In the algorithm outlined above the inverse transform method is always
used twice; in step D first to go from the non-normal distribution to a uniform
distribution and second to transform the obtained uniform distribution into a
normal distribution. In step H the procedure is repeated in the reversed order, first
going from the normal distribution to a uniform distribution and second
transforming the obtained uniform distribution into a non-normal distribution.
The whole technique stands or falls with the availability of a distribution
function fitting the observed parameter distributions. Azzalini and DallaValle
(1996) developed the skewed normal distribution SN(0,1,Ŭ) with its density
function of the form ( ) ( )zz ÖFÖÖ af2 where ʬ and ū are the N(0,1) normal
probability density and cumulative distribution function. The shape parameter Ŭcontrols the skewness of the distribution with Ŭ = 0 resulting in a standard normal
distribution, Ŭ < 0 resulting in a distribution skewed to the left and Ŭ > 0 resulting
in a distribution skewed to the right. Delianedis (2000) used a mixture of two
zero-mean normal distributions with unequal variances to introduce kurtosis. In
the current approach these two approaches were combined introducing both
kurtosis and skewness. Instead of using a combination of only two normal
distributions with unequal variances to introduce kurtosis, a range of 6
distributions (ɖ = 6) with increasing standard deviations was used each balanced
by a standard normal distribution. In this way a smooth overall distributions can
be obtained. This resulted in:
( ) ( )1
1
2 1,0,1 ,0,1 ,0,
2 1 2 ii
SKN z z z
h
a h f fh b -
=
å õå õæ ö= ÖF Ö Ö Ö + æ öæ öæ öÖ + Öç ÷ç ÷
ä (5.2)
with ɓÍÁ+ the shape factor controlling the kurtosis of the distribution with ɓ = 0
resulting in a standard normal distribution. An impression of the different faces of
the SKN distribution from Eq. 5.2 is represented in Fig. 5.4.
A numerical analysis of the SKN distribution from Eq. 5.2 has shown that the
surface under the curve equals one, regardless the values of Ŭ and ɓ, making the
function a real distribution function. It can be proven analytical as well that the
integral of the SKN distribution equals one (Scheerlinck, pers. comm.; appendix
B. Proof of PDF).
142 Quality Change Modelling in Postharvest Biology and Technology
-2 0 20.0
0.2
0.4
0.6
0.8
1.0
z
SKN
(z)
Fig. 5.4 The different faces of the SKN distribution from Eq. 5.2. The bold curve
is the standard normal distribution with Ŭ = 0 and ɓ = 0. The skewed curves are
the result from Ŭ ranging from -10 to 10 (ɓ = 0). The peaked curves in the middle
are the result from ɓ ranging from 0 to 10 (Ŭ = 0). By combining different values
of Ŭ and ɓ intermediate shapes can be obtained.
5.3.3 Tomato colour as an example
The data from the tomato experiment describing hue colour during time (section
4.3) Is revisited once more to discuss some aspects of bootstrap resampling and
Monte Carlo simulations. In this case focus will be on the data of the validation
experiment (section 4.3.3.5) describing the colour change of 120 ‘Tradiro’ tomato
fruit stored at 18 °C (Fig. 5.5).
Fig. 5.5 Hue colour change of 120
‘Tradiro’ tomatoes stored at 18 °C.
The points are the experimental data
with the lines connecting data points
measured on the same fruit. 0 2 4 6 8 10
40
50
60
70
80
90
hu
e (
°)
time (d)
As the data was measured non-destructively following individual fruit during
time, information is available on which points belong together. However, at this
stage this information will be ignored treating the data from Fig. 5.5 as if these
5. Model calibration and uncertainty analysis 143
are just multiple measurements taken on different fruit as a function of time. As
only data at 18 °C is considered only the parameters H0,refck and ¤+H and ¤-H are
relevant, ignoringc
kEa . Initially the data from Fig. 5.5 is being analysed with the
simple model from Eq. 4.2 describing the averaged batch behaviour (ignoring the
aspect of biological age). To correct for heteroscedasticity the Box Cox
transformation is applied. This results in the parameter estimates from Table 5.2
(column heading ‘non linear regression analysis’).
The standard errors returned by the non linear regression analysis are
approximate errors which can be used to approximate the confidence intervals.
By applying model based bootstrapping multiple estimates for H0,refck and
¤+H can be obtained, based on which more accurate confidence intervals can be
established. If for instance 1000 bootstrap samples are generated and analysed,
detailed distributions of the model parameters can be generated (Fig. 5.6) and the
upper and lower 2.5 percentiles can be determined to calculate their 95 %
confidence intervals (Table 5.2, column heading ‘bootstrap analysis’).
Using the bootstrap results the covariance structure of the model parameters
can also be determined (Fig. 5.6). In this case the model parameters are all close
to normally distributed resulting in almost symmetric confidence intervals (Table
5.2, column heading ‘bootstrap analysis’).
If one plots a sub sample of the 1000 bootstrap model fits, a striking result is
observed (Fig. 5.7). While the individual tomato fruits changed colour more or
less in parallel (Fig. 5.5) the bootstrap model fits show curves crossing each other
(Fig. 5.7). The reason for this is that the covariance information enclosed in the
information on which points belonged together to the same fruit was completely
ignored. Thus, by randomly resampling the residuals, bootstrap data sets can be
Table 5.2 Parameter estimates and their 95 % confidence interval (c.i.) resulting
from the different analyses of colour change data of ‘Tradiro’ tomato stored for
10 d at 18 °C using the model from Eq. 4.2.
non linear
regression analysis
bootstrap
analysis
individual
fruit analysis
Parameter a Value 95 % c.i. Value 95 % c.i. Value 95 % c.i.
H0
¤+Hrefck
54.9
39.3
0.0022
53.0-56.7
38.7-39.9
0.0018-0.0026
53.5
40.2
0.0024
52.6-54.7
39.8-40.6
0.0021-0.0027
59.3
39.8
0.0028
45.6-85.4
38.0-41.7
0.0021-0.0040 a) ¤+H is the asymptotic colour values (in °) at plus infinite time; H0 is the initial colour value at
harvest (in °); refck (in d-1) is the value of the rate constant kc at 18 °C; ¤-H is fixed at a value of 124°.
144 Quality Change Modelling in Postharvest Biology and Technology
generated that combine relative high starting hue values with relative low ending
hue values or the other way round; combinations that do not occur in reality.
Fig. 5.6 Covariance structure and frequency distributions for the 1000 bootstrap
model parameters based on the unstructured colour change data of 120 ‘Tradiro’
tomatoes stored for 10 d at 18 °C using the model from Eq. 4.2.
These ‘incorrectly’ created bootstrap data sets aversively affect the model
parameter estimates and their correlation structure. When the experimental data
set contains data from fruit that already started from low hue values it can even
happen that a bootstrap dataset is generated that results in estimates for H0 below
the values of ¤+H resulting in unrealistic product behaviour of hue colour
increasing with time.
As an alternative approach the model parameters estimated on the 120
individual fruits can themselves be interpreted as a representative sample of the
parameter space. Because of the low number of data points the distributions of the
parameters, their covariance structure (Fig. 5.8) and their confidence intervals
(Table 5.2, column heading ‘individual fruit analysis’) are statistically less
accurate defined than based on the larger bootstrap data, but, in this particular
case, are probably closer to reality.
5. Model calibration and uncertainty analysis 145
Fig. 5.7 Hue colour change of a sub
sample of 10 bootstrap model fits
generated based on the unstructured
data from Fig. 5.5 on ‘Tradiro’
tomato stored at 18 °C.0 2 4 6 8 10
40
42
44
46
48
50
52
54
56
hu
e (
°)time (d)
Comparing Fig. 5.6 to Fig. 5.8 reveals that the distributions of the model
parameters are not that close to normal as one would expect based on the
bootstrap results; they are both skewed and peaked. As a consequence the
resulting confidence intervals for particularly H0 and refck are not symmetric
(Table 5.2). In general, bootstrap analyses can contribute to improving the
characterisation of model parameter variation, but at the same time one should
not ignore the information available in fruit-to-fruit related variation.
Hp
lus
H0
kc
kc H0 Hplus
-5 0 50
0.05
0.1
0.15
0.2
0.25
-2 0 2 4-2
-1
0
1
2
3
-2 0 2 4-4
-2
0
2
4
-2 0 2 4-2
0
2
4
-5 0 50
0.05
0.1
0.15
0.2
0.25
-2 0 2 4-4
-2
0
2
4
-4 -2 0 2 4-2
0
2
4
-4 -2 0 2 4-2
-1
0
1
2
3
-5 0 50
0.05
0.1
0.15
0.2
0.25
Fig. 5.8 Blue data: Covariance structure and frequency distributions for the 120
parameter sets based on the individual colour change data of 120 ‘Tradiro’
tomatoes stored for 10 d at 18 °C using the model from Eq. 4.2. Red data: 150
newly generated random parameter sets based on the individual fruit parameters
(blue data).
146 Quality Change Modelling in Postharvest Biology and Technology
Using the outlined technique from section 5.3.2, random correlated parameter
sets were generated for the case of the tomato colour model, using the bootstrap
samples from Fig. 5.6 and Fig. 5.8 as a starting point. The first example shows
the results of 1000 sets of correlated random parameter generated from the
bootstrap parameter sets that were based on the unstructured colour change data
of 120 ‘Tradiro’ tomatoes stored for 10 d at 18 °C. The randomly generated data
give good agreement with the original bootstrap data (Fig. 5.9). Also when
checking the statistical parameters like variances, covariances, means, skewness
and kurtosis, the generated samples nicely match the original data (data not
shown). From a theoretical point of view this is not unexpected as the covariance
decomposition algorithm is a proven technique to generate random co-varying
Gaussian random parameters. So, as long as there is an accurate fit of the SKN
distribution on the bootstrap data the randomly generated parameter sets should
be correct.
Fig. 5.9 Covariance structure and frequency distributions of the original
bootstrap data (Fig. 5.6, blue data) and of 1000 newly generated random
parameter sets (red data). The model parameters are based on the unstructured
colour change data of 120 ‘Tradiro’ tomatoes stored for 10 d at 18 °C using the
model from Eq. 4.2.
5. Model calibration and uncertainty analysis 147
The second example shows the results of 150 sets of correlated random
parameter that were generated from the 120 parameter sets based on the
individual colour change data of 120 ‘Tradiro’ tomatoes stored for 10 d at 18 °C.
Again, the randomly generated parameter sets showed good agreement with the
original bootstrap data (Fig. 5.8).
0.00260
0.00265
0.00270
0.00275
0.00280
0.00285
0.00290
0.00295
0.00300
m
kc
56
57
58
59
60
61
62
63
64
65 H0
39.5
39.6
39.7
39.8
39.9
40.0
40.1
40.2
40.3
40.4
40.5 H+
0.00030
0.00035
0.00040
0.00045
0.00050
0.00055
0.00060
0.00065
0.00070
s
8
9
10
11
12
13
14
15
16
17
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
1.20
1.25
1.30
1.35
1.40
1.45
-0.30
-0.15
0.00
0.15
0.30
0.45
0.60
0.75
0.90
1.05
1.20
1.35
1.50
1.65
g1
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
2.25
2.50
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
g2
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
-2
-1
0
1
2
3
4
5
Fig. 5.10 Mean (µ) standard deviation (ů), kurtosis (ɔ1) and skewness (ɔ2) and
their 95 % confidence intervals of 50 data sets, each containing 150 sets of
randomly generated model parameters. The model parameters are based on the
individual fruit colour change data of 120 ‘Tradiro’ tomatoes stored for 10 d at
18 °C using the model from Eq. 4.2.The first bar emphasised by the dashed box
indicate the data from the original bootstrap data.
By generating these random sets from Fig. 5.8 over and over again and
looking at the realised means, standard deviations, skewness and kurtosis of the
three model parameters a slight but consistent discrepancy between the original
bootstrap data and the subsequent random parameter sets was observed (Fig.
5.10). Especially in the case of H0 and ¤+H the standard deviation, kurtosis and
skewness tended to higher values in the resampled parameters as compared to the
original data. The differences were most of the time not significant but too
systematic to be ignored. The reason for this systematic deviation is the small
number of parameters in the original bootstrap set resulting in a marginal fit of
the SKN distribution to the original bootstrap data leaving relative large residuals
148 Quality Change Modelling in Postharvest Biology and Technology
(Fig. 5.8). For refck this fit was better (Fig. 5.8) resulting in more accurate
resampling results. This emphasises the importance of a good fit of the SKN
distribution to the original bootstrap data to enable reliable resampling of the
model parameters for the subsequent Mont Carlo simulations.
The randomly generated model parameters based on the individual fruit data
can subsequently be used to perform Monte-Carlo simulations to get a good idea
of the behaviour of a random batch of fruit through different postharvest
temperature regimes. Based on a large random sample of 1000 model parameter
sets 6 different temperature regimes were simulated to study the effect of
temperature using the energy of activation from Table 4.1 as estimated on the
original individual fruit data. At 12 °C (Fig. 5.11A) hue colour decreased slowly
as compared to storage at 18 °C (Fig. 5.11F) maintaining large levels of variation
throughout the storage period.
When a period of 2 d at 18 °C is introduced (Fig. 5.11B, C and D) hue colour
decreases during this period at the same time reducing the level of variation. By
the end of the storage period, regardless the timing of the warm period,
comparable levels of variation are reached. In the case of a constantly fluctuating
temperature (Fig. 5.11E) an oscillating decrease in hue and the related level of
variation is obtained.
Based on these results the following contradicting conclusions can be reached
for the case of tomato colour change. From the point of view of prolonging shelf
life, 12 °C storage gives the best results slowing down ripening. From the point of
view of delivering a homogeneous batch of fruit to the market, higher
temperatures are preferable as they result in less variation. As long as no rots or
excessive weight loss occurs, 4 d to 6 d storage at 18 °C would give the best
result in terms of producing a homogeneous batch of red coloured ready-to-eat
fruit although it would result in a shorter shelf life.
The Monte-Carlo distribution clearly illustrates how temperature is affecting
the distribution of hue colour. As such, the colour distribution at any time during
postharvest contains information on the incurred temperatures and thus can be
used to judge temperature control throughout a logistic chain.
5. Model calibration and uncertainty analysis 149
0.250 0.144 0.083 0.048 0.027 0.016 0.009 0.005 0.003 0.002
Fig. 5.11 Density plot of the result of 1000 Monte-Carlo simulations simulating
for 6 different postharvest temperature regimes (A-F). The colours represent the
ratio of fruit with a particular hue colour at a particular time. A: constant 12 °C;
B: 12 °C, except for 18 °C from t = 0 d to t = 2 d; C: 12 °C, except for 18 °C from
t = 2 d to t = 4 d; D: 12 °C, except for 18 °C from t = 4 d to t = 6 d; E:
temperature varying with two day intervals between 12 °C and 18 °C; F: constant
18 °C. The Monte-Carlo model parameters were generated based on the
individual fruit colour change data of 120 ‘Tradiro’ tomatoes stored for 10 d at
18 °C.
150 Quality Change Modelling in Postharvest Biology and Technology
If one closely examines the Monte-Carlo simulation results from Fig. 5.11,
one can also notice that, once a batch has reached a certain averaged hue colour,
the distribution at that given moment is always the same. This is illustrated by
Fig. 5.12 showing the frequency distributions for the batches from the different
postharvest temperature regimes in Fig. 5.11 when reaching an average hue
colour of 47°. The frequency distributions for the 6 batches are identical given the
averaged batch colour of 47° (Fig. 5.12) indicating that the shape of this
frequency distribution is independent of the prior temperature history. In other
words: regardless the trajectory along which the tomatoes reached an average
colour of 47°, the corresponding distribution remains the same.
40 50 60 70 800.00
0.02
0.04
0.06
0.08
0.10
v(H
)
hue (°)
Fig. 5.12 Relative frequency plots of hue colour. Data for the 6 curves was taken
from the 6 Monte-Carlo simulations from Fig. 5.11. For each of the Monte-Carlo
simulations the moment was selected at which the average batch reached a hue
value of 47°. The exact time for this depended on the postharvest temperature
regimes applied (Fig. 5.11; A: t = 10 d; B: t = 2.8 d; C: t = 3.8 d; D: t = 5.2 d; E:
t = 4.3 d; F: t = 1.7 d). For each of these times, the relative frequency
distributions for hue colour were reconstructed.
5.4 Conclusions
The aim of modelling in postharvest biology and technology is to develop valid
models that are consistent with our current knowledge and are free of flaws of
logic. To upgrade a conceptual model into a mathematical model the
mathematical equivalents of the concepts have to be formulated and last but not
5. Model calibration and uncertainty analysis 151
least, values have to be assigned to the model parameters. In some cases
parameters are well defined physical constants or measurable quantities.
However, in many cases the model parameters represent properties of lumped
systems that can not be measured directly and need to be estimated based on
experimental data. For this purpose OptiPa, a dedicated optimisation tool, was
developed.
Special attention was paid to its ease of use to classify the model parameters
into generic, cultivar specific, batch specific and fruit variable parameters and
estimate them accordingly. By classifying model parameters into for instance
batch and cultivar specific parameters, the feasibility of turning a descriptive
model into a predictive model becomes within reach as was illustrated for the
potato sweetening model. Model parameters identified as cultivar specific can
readily be conserved and transferred from one situation to the other as long as the
same cultivar is involved. For parameters identified as batch specific, focused
measurements can be taken to determine these parameters for new batches to
allow predictive application of the model to describe the behaviour of future
batches of otherwise known cultivars. So, the analysis step of classifying model
parameters is an essential step in developing predictive models and is facilitated
by OptiPa.
Working in postharvest biology and technology involves coping with the
omnipresent biological variation. Therefore numerical tools were implemented to
generate accurate estimates of the confidence intervals of the model parameters
(bootstrap techniques) and to predict the propagation of parameter variation
(Monte Carlo simulations). For this purpose a new generic probability
distribution function was developed to allow the generation of random correlated
model parameters with non-Gaussian distributions allowing both skewness and
kurtosis.
152 Quality Change Modelling in Postharvest Biology and Technology