model calibration and uncertainty analysisu0040603/thesis/ch5_model...quality change modelling in...

Quality Change Modelling in Postharvest Biology and Technology

Maarten L.A.T.M. Hertog - 2004

5

Model calibration and uncertainty

analysis

5.1 Introduction

One of the dilemmas in modelling in postharvest biology and technology is that

verification and validation of these models is impossible. As natural systems are

never closed they are subject to external influences not taken into account by the

model. Unless we would have a closed system, one can only falsify a model but

never verify it (Oreskes et al., 1994). Also, if one can not falsify a model (yet)

this does not imply that the model is true. One might always run into a situation

where the model does not hold. For this reason the aim of modelling in

postharvest biology and technology should not be to develop true models but to

develop valid models; models that are consistent with our current knowledge

level and that contain no known or detectable flaws of logic.

Once the concept of the model has been identified and the mathematical

equivalent has been formulated, values have to be assigned to the model

parameters. In some cases parameters are well defined physical or mathematical

128 Quality Change Modelling in Postharvest Biology and Technology

constants (such as the gas constant R, the numbers ˊ and e, and the molar mass of

water OH2M ) or measurable quantities (such as initial fruit weight or colour,

product density, size dimensions, film permeability, fruit permeance and dry

matter content). However, in many cases the model parameters represent

properties of simplified lumped systems (rate constants of an overall enzyme

system, initial amount of substrate or of the lumped enzyme system) or derived

properties (such as the energies of activation) that can not be measured directly.

As the values for these parameters can not be obtained from tables or direct

measurements they need to be estimated based on experimental data. This is done

through an iterative process of fitting the model to experimental data, also known

as model calibration or parameter estimation.

When the developed model is relatively simple an analytical solution in the

form of an algebraic expression can often be derived. In these cases, standard

statistical packages (SAS, Genstat, SPSS, Statistica, etc.) can be used to perform

non linear regression to estimate the missing parameter values. When the model

contains parameters depending on independent model variables (such as T, RH,

2Op and2COp ) that change during the experiment the analytical solution is often

not available and the original ODE based model formulation needs to be used. To

estimate parameters using ODE based models the range of available software is

much more limited. Most of the available software is dedicated to a specific

research area, limiting the type of models that can be implemented.

MatLab (MatLab v. 6.5, 2002, The MathWorks, Inc., Natick, MA, USA)

offers the framework to allow implementation of a wide range of models and

contains all the optimisation routines required to estimate model parameters for

ODE based models. Besides implementing the model itself, additional

programming is required to organise the administration of model inputs, model

outputs and the graphical and statistical processing of the model identification

results. As part of this thesis a MatLab based generic software environment is

developed to estimate parameters of ODE based model systems.

The objective of this chapter to outline some of the main features of the

developed optimisation tool, called OptiPa, emphasising the statistical properties

of the parameter estimation and their effect on model predictions. The MatLab

interface itself is outlined in appendix A. OptiPa, a MatLab interface.

Using the different case studies from this thesis and an additional example on

potato sweetening, the aspect of classifying model parameters into generic,

cultivar specific, batch specific and fruit variable parameters using OptiPa is

5. Model calibration and uncertainty analysis 129

discussed (section 5.2). In this way OptiPa can be used to readily identify the

different sources of variation in a model, enhancing the interpretation of

experimental data and the subsequent application of the model to different

situations.

To acquire accurate estimates of the confidence intervals of the model

parameters, bootstrap techniques have been implemented. Based on these

confidence intervals, Monte Carlo simulations can be performed to study model

behaviour at different conditions taking into account the parameters’ accuracies

(section 5.3). Again this will be illustrated using tomato colour as an example.

The innovative aspect in the developed interface is the aspect of classifying

model parameters to account for the different sources of variation. Furthermore, a

new distribution function was developed to facilitate the generation of co-varying

non-Gaussian random parameter sets allowing for both skewness and kurtosis.

Parts of the material presented in this chapter were published in Hertog et al.

(1997c, 1997d, 1999b).

5.2 Variable and constant parameters

5.2.1 Classifying model parameters

In section 4.3.3.2 the data on tomato hue colour was analysed to identify the

different sources of variation in terms of which of the model parameters could be

treated as generic constants, which of them as cultivar dependent parameters and

which of them as fruit variable parameters. In chapter 4 the data on hue colour

was analysed by non linear regression in SAS, using the analytical solution from

Eq. 4.2. The exercise on classifying model parameters into generic, cultivar

specific, batch specific and fruit variable parameters can also be done with OptiPa

using the ODE model formulation from Eq. 4.1. After an initial optimisation run,

selecting all parameters to be estimated in common, one can stepwisely add or

remove parameters to be estimated either per cultivar or per individual fruit. As

the cultivar ‘Tradiro’ starts from an in average lower hue value (Fig. 4.5) a logical

first option is to select H0 to be estimated per cultivar. By doing so, separate

values are estimated for the three cultivars. Ultimately, all model parameters can

be estimated per individual tomato to see how much these parameters change in

between individual fruit. This would eventually lead to the same exercise as done

in section 4.3.3.2. This whole procedure of identifying variable and constant


model parameters can be done through the graphical user interface of OptiPa,

without any additional programming.

What seems to be an arbitrary playing around with the model can be an

essential element in classifying model parameters to find out whether they are

generic constants or depending on factors like cultivar, batch, grower, season or

temperature. Based on the model formulation and the meaning of the parameter

within the context of the model, one can often already tell what the most likely

case is. True rate constants that are properties of an enzyme (system) should be

constant as long as the same enzyme (system) is involved. Apparent rate

constants can strongly vary as they rely on, for instance, the amount of enzyme

present. Good problem decomposition should therefore always, at least formally,

separate the true rate constant from the enzyme concentration so that effects

related to enzyme turnover can be separated from for instance a temperature

effect on the rate constant. Even when initially this might play no role, a clear

separation from the start might enhance the interpretation of unexpected effects

later on. Other parameters indicating the initial status of the fruit might be varying

from fruit to fruit or might be related to treatment effects. The ultimate example

of this is the biological age of tomato fruit determining colour at harvest (section

4.3) with the cultivar specific parameters ¤+H , refck and

ck

Ea and the generic

parameter ¤-H . In the case of spoilage of strawberry (section 3.3) all model

parameters were kept generic, except for the initial level of spoilage varying per

batch which was determining the batch specific behaviour. With the stem growth

of Belgian endive (section 3.4), the kinetic parameters were again kept in

common, with the initial stem length taken as a batch dependent variable and the

initial head mass as a variable that changed between chicory heads. With the

softening of kiwifruit (section 3.5) the kinetic parameters were kept in common

with only the initial and final firmness level made batch (in this case season)

dependent.

By classifying model parameters like this, one can identify both the most

important variable and constant parameters. If the generic model parameters

indeed prove to be constant, the models can easily be applied to new batches by

measuring only the batch dependent parameters and transferring the generic

parameters. In this way models can be reused for different batches without having

to reparameterise the whole model. In other words, the model might get a

predictive value.


5.2.2 The case of potato sweetening

This section will illustrate how a descriptive model can be upgraded to a

commercial predictive model by classifying model parameters into variable and

constant parameters using an example on the sweetening of potato tubers during

storage (see Hertog et al., 1997c, 1997d, 1999b for a detailed description of

materials and methods and complete results). This section will focus on the

concept of classifying model parameters.

In potatoes, reducing sugars are involved in the non-enzymatic browning

reaction, known as the Maillard reaction (Ellis, 1959), and thus the amount of

reducing sugars (glucose and fructose) determines the processing potential of

potatoes in terms of frying colour (Burton, 1989). The amount of reducing sugars

in potatoes at the time of processing depends on the conditions during the

preceding storage period (Burton 1965, 1989).

Hertog et al. (1997c) developed a dynamic mathematical model based on a

simplification of the underlying physiological processes to describe the storage

behaviour of potato (Solanum tuberosum L.) tubers in terms of accumulation of

reducing sugars. The data necessary for calibration and validation of the model

was gathered during long term storage experiments over a wide range of storage

temperatures (2 °C - 14 °C) for several seasons and cultivars. Although the model

is based on a considerable simplification of the occurring physiological processes,

it was capable of accounting for about 95% of the observed storage behaviour,

including both cold-induced and senescent sweetening.

5.2.2.1 Modelling approach

The storage potential of a certain batch of potatoes is largely determined by the

state of maturity of the tubers at harvest (Burton, 1965; Nelson and Shaw, 1976;

Iritani and Weller, 1980; Coffin et al., 1987; Pritchard and Adam, 1992). Growth

conditions and time of harvest are the most important factors affecting this state

of maturity. In spite of the apparent differences in storage behaviour between

different batches and cultivars, the actual processes leading to accumulation of

sugars are likely to be the same. The differences in degree of accumulation of

sugars will solely be the result of the different extent in which the separate

processes contribute. The two major processes involved are cold induced

sweetening and senescent sweetening. In both cases sugars are released from a

large pool of starch which is assumed constant.


Cold-induced sweetening. The sequence by which accumulation of reducing

sugars occurs, is mobilisation of starch, followed by an increased synthesis of

sucrose and finally hydrolysis of sucrose to glucose and fructose. The overall

pathway from starch via sucrose to reducing sugars, can be simplified into one

step: a conversion from starch to reducing sugars. During storage over several

months, the level of accumulated sugars decreases in general before the onset of

senescent sweetening (Burton, 1989). Apparently the responsible enzyme system

is susceptible to an increasing malfunctioning during prolonged storage. This is

interpreted as a slow denaturation of the responsible enzyme system.

Senescent sweetening. Sugars mobilised during senescence are released for

the benefit of development and growth of sprouts (Burton, 1989). Consequently,

senescent sweetening may be initiated soon after the break of dormancy (Burton,

1977; Hughes and Fuller, 1984). Senescent sweetening, is assumed to be induced

by a second enzyme system. Inhibition of sprout growth during senescence

stimulates the accumulation of reducing sugars (Isherwood and Burton, 1975).

Although the external features of sprouting are suppressed, the metabolic

processes are apparently not. The higher the storage temperature, the earlier

senescent sweetening starts (Barker, 1938). As starch is always abundantly

present, the initial amount of enzyme responsible for the conversion into sugar

has to be very low. To reach an enzyme activity eventually large enough to

generate senescent sweetening, the enzyme has to be formed. This increase is

modelled by an exponential formation. This is in agreement with the hypothesis

of Kumar and Knowles (1993) who suggest that increased starch hydrolysis

during senescence is the result of increasing peroxidative damage of the

amyloplast membrane resulting in increasing contact between enzymes and

substrate. This also invokes an exponential increase in enzyme activity.

During storage, sugars (S) are consumed by respiration. By simple mass

action, respiration is considered to be directly related to the amount of

accumulated sugars. As potatoes are mostly stored under well ventilated

atmospheric conditions, the amount of oxygen is considered to be constant (21

kPa) and not rate limiting.

It is thus assumed that two different enzyme systems are involved for the two

different processes. The enzyme system responsible for senescent sweetening

(Esene) is accumulating in time while the enzyme system responsible for cold

induced sweetening (Ecold) is susceptible to denaturation. This resulted in the

ODE formulation from Eq. 5.1.


( )

1

2

3 4 5 2O

cold cold

sene sene

cold sene

dE dt k E

dE dt k E

dS dt k E k E Starch k S

= - Ö

= + Ö

= Ö + Ö Ö - Ö Ö

(5.1)

The initial values of Ecold and Esene were set to a relative value of 1. All process

rates are assumed to depend on temperature according Arrhenius. See Hertog et

al., 1997c, 1997d, 1999b for further details.

For the largest part of storage (6 months - 8 months) and depending on storage

temperature, the process of senescent sweetening is not relevant. Even during

longer storage periods the occurrence of senescent sweetening appears to be

unpredictable. The main quality issue during regular storage is ruled by the

process of cold induced sweetening and therefore this is what is being focused on.

5.2.2.2 Model results

Analysing data from several cultivars over different years allowing all parameters

to vary per batch (season ³ cultivar combination), showed that the seasonal effect

on the cultivars was not limited to any particular parameter. However, given

seasonal differences and different harvest dates the state of maturity at harvest is

likely to be different between batches. During the development of tubers attached

to the mother plant the activity of several enzymes related to sugar metabolism

(such as sucrose synthase, ADP-glucose pyrophosphorylase and ATP

fructokinase) change (Morrell and Ap Rees, 1986; Merlo et al., 1993).

Merlo et al. (1993) reported for most enzyme activities in developing tubers,

an optimum curve with the optimum for 8 week old plants. The observation from

Merlo et al. implies that the initial value of Ecold depends on the state of maturity

at harvest. Based on this reasoning, the combined data of all storage seasons were

re-analysed per cultivar, forcing all seasonal effects to the single model parameter

Ecold,0. The remaining kinetic parameters were estimated in common for the

successive seasons and became cultivar specific. As Ecold,0 is a relative value, and

up to now fixed at 1, some reference point had to be chosen. For this purpose the

value for the season most likely to generate the most immature tubers, was fixed

at 1 as a standardised reference for the other seasons.

The values estimated for Ecold,0 for the individual seasons appeared to depend

on the length of the period from planting to harvest (Fig. 5.1). This is in

agreement with the observations described by Merlo et al. (1993). Although the

maturity of tubers is not exclusively determined by the length of the period from


planting to harvest, it was a dominant factor influencing the state of maturity

during the successive seasons.

Fig. 5.1 Estimations of the season-

specific parameter Ecold,0 for 'Bintje'

and 'Saturna' tubers in relation to the

length of the growing period

150 160 170 180 190 200 2100.0

0.2

0.4

0.6

0.8

1.0

1.2

: Bintje

: Saturna

(days after planting)

Eco

ld,0

harvest

5.2.2.3 Model validation

The effect of different maturity stages on the accumulation of reducing sugars

during storage of 10 different potato varieties was studied by Putz (1993). Tubers

were harvested weekly (July 7 to August 29 1989) and were stored for 16 weeks

at 4 °C. The later the tubers were harvested, the lower the initial level of reducing

sugars and the lower the subsequent maximum accumulation reached. No

seasonal variation was included in the data of Putz, so the state of maturity at

harvest was directly related to the time of harvest.

This averaged data was analysed by Hertog et al. (1997d) using the approach

described above, attributing the differences between harvest to differences in

Ecold,0. The initial amount of Ecold,0 was assumed batch specific and related to the

successive harvest dates, while the other model parameters were estimated in

common for the combined set of data. As a result, Ecold,0 decreased almost linearly

as a function of harvest time and thus of the related state of maturity (Fig. 5.2).

This analysis of an independent set of data confirmed the concept that the state of

maturity at time of harvest determines the storage behaviour through the initial

amount of enzyme (or enzyme system) present.

Hertog et al. (1997c) revisited the data from Putz (1993) this time using the

original data on the ten individual cultivars studied by Putz (1993). The model

parameter Ecold,0 gave a satisfactory explanation for the harvest effects for all ten

cultivars studied decreasing almost linearly as a function of harvest time and thus


of the related state of maturity (Hertog et al., 1997c). Ecold,0 may therefore be

considered as a general maturity dependent parameter determining the storage

potential of a given cultivar.

Fig. 5.2 Estimations of the season-

specific parameter Ecold,0 for the data

of Putz (1993) in relation to time of

harvest relative to the first harvest

0 10 20 30 40 50 600.0

0.2

0.4

0.6

0.8

1.0

1.2

(days after first harvest)

Eco

ld,0

harvest

It is not very likely that Ecold,0 can ever be identified as a single specific

enzyme. Probably Ecold,0 must be seen as the reflection of a more complex part

from the metabolism. For practical purposes however, it is not necessary to

identify the model enzyme Ecold as a single enzyme as long as the model

parameter Ecold,0 can be correlated with a concrete physiological property of

maturing potato tubers.

5.2.2.4 Practical application

Hertog et al. (1999b) extended their work studying the storage behaviour of 5

batches for both ‘Bintje’ and ‘Agria’ potatoes coming from 5 subsequent

harvests, each stored at 6 different constant temperatures. In addition, accelerated

shelf life testing was done by storing from each batch a sub sample for 4 weeks at

2 °C to measure sugar accumulation. In this way the cold susceptibility of the

specific batch could be determined. Assuming the cultivar specific model

parameters are all known, only the parameter Ecold,0 had to be estimated on the

data from the accelerated shelf life test. This estimate for Ecold,0 based on the

accelerated shelf life test was compared against the estimate based on the normal

long-term storage data. For both ‘Agria’ and ‘Bintje’, the estimates for Ecold,0

were not significantly different (Table 5.1).

Now the model parameters have been classified into batch and cultivar

specific parameters the model can be used for predicting storage behaviour of


future batches of potato. The only prerequisite is that the cultivar specific

parameters are known. For every new batch of a known cultivar, the batch

specific parameter Ecold,0 can be estimated at the start of the storage season using

an accelerated shelf life. Subsequently, the long-term storage behaviour of the

specific batch can be predicted using the available model.

Table 5.1 Estimates of Ecold,0 and their s.e. based on either the long-term storage

data or the accelerated shelf life testing.

‘Bintje’ ‘Agria’

Harvest

long-term

storage

accelerated

shelf life

long-term

storage

accelerated

shelf life

1 1 1.02 (0.05) 1 1.20 (0.09)

2 0.77 (0.02) 0.78 (0.05) 0.72 (0.02) 0.64 (0.10)

3 0.57 (0.02) 0.52 (0.05) 0.76 (0.02) 0.80 (0.09)

4 0.58 (0.02) 0.57 (0.06) 0.66 (0.02) 0.53 (0.09)

5 0.65 (0.02) 0.66 (0.05) 0.66 (0.02) 0.55 (0.09)

Based on the described quality change model the warehouse management can

be optimised taking into account the demands of the specific batch under storage

(Verdijck et al., 1999a, 1999b, 2002; Verdijck and Van Straten, 2002). This

resulted in a commercial implemented model-based-control system to

automatically manage and control the climate in potato stores taking into account

the quality of the potatoes stored. In this way the quality of raw material for the

frying industry can be maintained and controlled depending on the varying

demands of the market. This case of potato sweetening has clearly shown how the

classification of model parameters can contribute to upgrading a descriptive

model to the level of a predictive model.

5.3 Numerical approach to parameter variation

Once the model parameters have been classified into batch and cultivar specific

parameters the model can be used for prediction. An essential aspect in this is the

accuracy of the model parameter estimates. Given non linearity of the models,

using standard statistics, confidence intervals for the model parameters can only

be approximated. To acquire accurate estimates of the confidence intervals,

bootstrap techniques are required and have been implemented in OptiPa (section

5.3.1). They also allow the identification of asymmetric confidence intervals.


Once accurate parameter confidence intervals have been established Monte Carlo

simulations can be performed to study model behaviour at different conditions

taking into account the parameters’ inaccuracies. This can be done either using

the parameter values estimated during the bootstrap analyses or by generating

new random sets of parameters taking into account the correlation structure and

distributions of the parameters identified in the bootstrap analyses. A technique

has been implemented to generate these random correlated sets of parameters for

a large family of normal based, skewed and peaked distributions (section 5.3.2).

5.3.1 Bootstrap resampling

The bootstrap is a resampling method for statistical inference (Efron and

Tibshirani, 1993). It is commonly used to estimate confidence intervals, but can

also be used for sensitivity studies. In practical application, the bootstrap means

using some form of resampling with replacement from the actual data to generate

a large number of bootstrap samples. The exact nature of the resampling strategy

depends on the structure of the data. For each of these samples the statistical

quantities of interest can be computed. Based on these, 95 % confidence intervals

can simply be computed by sorting the estimates in ascending order and selecting

those values that cut off the upper and lower 2.5 percentiles.

Various types of resampling methods have been designed (Efron and

Tibshirani, 1993). In the area of postharvest biology and technology, most data

consist of time series in which the data at subsequent time steps are heavily

correlated and not necessarily equally distributed. As a consequence, simple

random resampling of the data with replacement is not appropriate as this

completely removes the original correlation between subsequent observations

(Bühlemann, 1999; Härdle et al., 2003). With moving block bootstrap the data is

divided into a number of non-overlapping blocks of sequential observations and

conserves at least some of the correlation between observations within the blocks.

The bootstrap sample is constructed by randomly sampling the blocks of data and

concatenating them into time series. This technique is suitable for regular

stationary data-generating processes but again not for the type of time series

generally encountered in postharvest.

Assuming a valid model is available to describe the dependence structure of

the sequential observations from the time series, this information can be used for

the bootstrap. Using the model predictions, residuals are calculated for each of the

observations. Then a bootstrap sample of residuals is drawn with replacement


from the observed residuals. The final bootstrap sample of the observations can

be constructed by adding the randomly sampled residuals to the predicted model

values. One of the prerequisites for this approach is that the residuals are

homoscedastic. In the case of heteroscedastic variation, like in the case of tomato

colour with large variation at t = 0 d and small variation at t = 21 d (Fig. 4.5),

resampling the residuals of the raw data would completely distort the data

structure and result in non-representative bootstrap samples. In the case of

heteroscedastic variation bootstrapping should therefore be used in combination

with a Box-Cox transformation to correct for heteroscedasticity.

This model based error resampling is the bootstrap technique implemented in

OptiPa. Once a model is implemented in OptipPa and experimental data is

available, parameters can be selected for the bootstrap analysis. After an initial

optimisation run to fit the model to the selected data, residuals are calculated.

Subsequently, by combining randomly resampled residuals with the predicted

model values, the requested number of bootstrap samples is generated and the

parameters selected for the bootstrap analysis are estimated for each of the

generated bootstrap samples keeping the remaining model parameters fixed.

Afterwards the 95 % confidence intervals of the bootstrapped model parameters

are reconstructed as outlined above.

5.3.2 Monte Carlo simulations

Monte Carlo simulation is a numerical stochastic technique used to solve

mathematical problems. A Monte Carlo simulation is based on some model

system that can be described as a function of random model parameters

characterised by their probability distribution functions. Monte Carlo simulation

simulates the model system after random sampling from these probability

distribution functions. Monte Carlo methods have been used since the late 1940’s,

but only since the availability of large computational power has the technique

gained the status of a numerical method capable of addressing large complex

applications. The technique is useful to obtain numerical solutions to problems

which are too complicated to solve analytically.

So, each of the Monte Carlo model runs is based on a different set of random

model parameter values. Either model parameter combinations coming from the

previously generated bootstrap data sets can be used or new sets of randomly

generated model parameters taking into account the covariance structure and

probability of the model parameters as identified before. Gaussian random


parameter sets can be easily generated using the covariance decomposition

algorithm. Given a covariance matrix V and a vector y containing the average

values of the Gaussian model parameters, the Cholesky decomposition can be

used to determine the Cholesky factor L and its transform LT so that: V = LÖLT

with L a lower triangular matrix. After generating a vector g containing standard

Gaussian random numbers (µ = 0, ů = 1) the required covariance structure can be

introduced by multiplying this vector with the Cholesky factor L. Finally, by

adding the vector y to correct for non-zero averaged Gaussian parameters, the

final vector y is obtained, containing the required set of co-varying Gaussian

random parameters (y = y + LÖg).

However, this technique is only applicable to generate co-varying Gaussian

random parameter sets. As could already be seen from the previous section,

model parameters are often not normally distributed and can either show

skewness or kurtosis. The only way to deal with this is to find a transformation of

the standard normal distribution matching the observed parameter distributions to

reshape the observed parameter distributions into standard normal distributions. If

such a transformation is available the covariance decomposition algorithm can be

applied in the Gaussian normal parameter space with the resulting parameter

values being back-transformed to the original non-normal parameter space.

The algorithm to do so is outlined in Fig. 5.3 and can be summarised as

follows:

× Fit the appropriate transformed normal density function to each of the

observed parameter distributions coming from the bootstrap (A)

× Based on the original bootstrap data compile the cumulative distribution

function (B)

× Generate a standard normal cumulative distribution function (C)

× Using the inverse transform method transform the original bootstrap data into

their corresponding values in the normal parameter space (D)

× Calculate the covariance matrix V for the normal transformed bootstrap data

and calculate the Cholesky factor L

× Using the Cholesky factor L generate a set of co-varying Gaussian random

parameters (E)

× Based on the random samples generate cumulative distribution functions (F)


D.

original

bootstrap

sample

B.A.

N(0,1)

E.

C.

F. H.

new

random

sample

G.

Fig. 5.3 Algorithm to generate co-varying random parameter sets given the

appropriate transformed normal density function to fit to the original bootstrap

data. See text for explanation.

× Generate for each of the observed parameter distribution coming from the

bootstrap, the cumulative transformed normal distribution function based on

the fitted transformed normal density function (G)

× Using the inverse transform method transform random Gaussian samples into

their corresponding values in the original non-normal parameter space (H)

The inverse transform method used to go back and forwards between the

unknown parameter distribution and the normal parameter space (step D and step


H in Fig. 5.3) is generally used to transform a uniform deviate U into a random

variable X with cumulative probability FX(x) following X = FX-1(U) (Rubinstein,

1981). In the algorithm outlined above the inverse transform method is always

used twice; in step D first to go from the non-normal distribution to a uniform

distribution and second to transform the obtained uniform distribution into a

normal distribution. In step H the procedure is repeated in the reversed order, first

going from the normal distribution to a uniform distribution and second

transforming the obtained uniform distribution into a non-normal distribution.

The whole technique stands or falls with the availability of a distribution

function fitting the observed parameter distributions. Azzalini and DallaValle

(1996) developed the skewed normal distribution SN(0,1,Ŭ) with its density

function of the form ( ) ( )zz ÖFÖÖ af2 where ʬ and ū are the N(0,1) normal

probability density and cumulative distribution function. The shape parameter Ŭcontrols the skewness of the distribution with Ŭ = 0 resulting in a standard normal

distribution, Ŭ < 0 resulting in a distribution skewed to the left and Ŭ > 0 resulting

in a distribution skewed to the right. Delianedis (2000) used a mixture of two

zero-mean normal distributions with unequal variances to introduce kurtosis. In

the current approach these two approaches were combined introducing both

kurtosis and skewness. Instead of using a combination of only two normal

distributions with unequal variances to introduce kurtosis, a range of 6

distributions (ɖ = 6) with increasing standard deviations was used each balanced

by a standard normal distribution. In this way a smooth overall distributions can

be obtained. This resulted in:

( ) ( )1

1

2 1,0,1 ,0,1 ,0,

2 1 2 ii

SKN z z z

h

a h f fh b -

=

å õå õæ ö= ÖF Ö Ö Ö + æ öæ öæ öÖ + Öç ÷ç ÷

ä (5.2)

with ɓÍÁ+ the shape factor controlling the kurtosis of the distribution with ɓ = 0

resulting in a standard normal distribution. An impression of the different faces of

the SKN distribution from Eq. 5.2 is represented in Fig. 5.4.

A numerical analysis of the SKN distribution from Eq. 5.2 has shown that the

surface under the curve equals one, regardless the values of Ŭ and ɓ, making the

function a real distribution function. It can be proven analytical as well that the

integral of the SKN distribution equals one (Scheerlinck, pers. comm.; appendix

B. Proof of PDF).


-2 0 20.0

0.2

0.4

0.6

0.8

1.0

z

SKN

(z)

Fig. 5.4 The different faces of the SKN distribution from Eq. 5.2. The bold curve

is the standard normal distribution with Ŭ = 0 and ɓ = 0. The skewed curves are

the result from Ŭ ranging from -10 to 10 (ɓ = 0). The peaked curves in the middle

are the result from ɓ ranging from 0 to 10 (Ŭ = 0). By combining different values

of Ŭ and ɓ intermediate shapes can be obtained.

5.3.3 Tomato colour as an example

The data from the tomato experiment describing hue colour during time (section

4.3) Is revisited once more to discuss some aspects of bootstrap resampling and

Monte Carlo simulations. In this case focus will be on the data of the validation

experiment (section 4.3.3.5) describing the colour change of 120 ‘Tradiro’ tomato

fruit stored at 18 °C (Fig. 5.5).

Fig. 5.5 Hue colour change of 120

‘Tradiro’ tomatoes stored at 18 °C.

The points are the experimental data

with the lines connecting data points

measured on the same fruit. 0 2 4 6 8 10

40

50

60

70

80

90

hu

e (

°)

time (d)

As the data was measured non-destructively following individual fruit during

time, information is available on which points belong together. However, at this

stage this information will be ignored treating the data from Fig. 5.5 as if these


are just multiple measurements taken on different fruit as a function of time. As

only data at 18 °C is considered only the parameters H0,refck and ¤+H and ¤-H are

relevant, ignoringc

kEa . Initially the data from Fig. 5.5 is being analysed with the

simple model from Eq. 4.2 describing the averaged batch behaviour (ignoring the

aspect of biological age). To correct for heteroscedasticity the Box Cox

transformation is applied. This results in the parameter estimates from Table 5.2

(column heading ‘non linear regression analysis’).

The standard errors returned by the non linear regression analysis are

approximate errors which can be used to approximate the confidence intervals.

By applying model based bootstrapping multiple estimates for H0,refck and

¤+H can be obtained, based on which more accurate confidence intervals can be

established. If for instance 1000 bootstrap samples are generated and analysed,

detailed distributions of the model parameters can be generated (Fig. 5.6) and the

upper and lower 2.5 percentiles can be determined to calculate their 95 %

confidence intervals (Table 5.2, column heading ‘bootstrap analysis’).

Using the bootstrap results the covariance structure of the model parameters

can also be determined (Fig. 5.6). In this case the model parameters are all close

to normally distributed resulting in almost symmetric confidence intervals (Table

5.2, column heading ‘bootstrap analysis’).

If one plots a sub sample of the 1000 bootstrap model fits, a striking result is

observed (Fig. 5.7). While the individual tomato fruits changed colour more or

less in parallel (Fig. 5.5) the bootstrap model fits show curves crossing each other

(Fig. 5.7). The reason for this is that the covariance information enclosed in the

information on which points belonged together to the same fruit was completely

ignored. Thus, by randomly resampling the residuals, bootstrap data sets can be

Table 5.2 Parameter estimates and their 95 % confidence interval (c.i.) resulting

from the different analyses of colour change data of ‘Tradiro’ tomato stored for

10 d at 18 °C using the model from Eq. 4.2.

non linear

regression analysis

bootstrap

analysis

individual

fruit analysis

Parameter a Value 95 % c.i. Value 95 % c.i. Value 95 % c.i.

H0

¤+Hrefck

54.9

39.3

0.0022

53.0-56.7

38.7-39.9

0.0018-0.0026

53.5

40.2

0.0024

52.6-54.7

39.8-40.6

0.0021-0.0027

59.3

39.8

0.0028

45.6-85.4

38.0-41.7

0.0021-0.0040 a) ¤+H is the asymptotic colour values (in °) at plus infinite time; H0 is the initial colour value at

harvest (in °); refck (in d-1) is the value of the rate constant kc at 18 °C; ¤-H is fixed at a value of 124°.


generated that combine relative high starting hue values with relative low ending

hue values or the other way round; combinations that do not occur in reality.

Fig. 5.6 Covariance structure and frequency distributions for the 1000 bootstrap

model parameters based on the unstructured colour change data of 120 ‘Tradiro’

tomatoes stored for 10 d at 18 °C using the model from Eq. 4.2.

These ‘incorrectly’ created bootstrap data sets aversively affect the model

parameter estimates and their correlation structure. When the experimental data

set contains data from fruit that already started from low hue values it can even

happen that a bootstrap dataset is generated that results in estimates for H0 below

the values of ¤+H resulting in unrealistic product behaviour of hue colour

increasing with time.

As an alternative approach the model parameters estimated on the 120

individual fruits can themselves be interpreted as a representative sample of the

parameter space. Because of the low number of data points the distributions of the

parameters, their covariance structure (Fig. 5.8) and their confidence intervals

(Table 5.2, column heading ‘individual fruit analysis’) are statistically less

accurate defined than based on the larger bootstrap data, but, in this particular

case, are probably closer to reality.


Fig. 5.7 Hue colour change of a sub

sample of 10 bootstrap model fits

generated based on the unstructured

data from Fig. 5.5 on ‘Tradiro’

tomato stored at 18 °C.0 2 4 6 8 10

40

42

44

46

48

50

52

54

56

hu

e (

°)time (d)

Comparing Fig. 5.6 to Fig. 5.8 reveals that the distributions of the model

parameters are not that close to normal as one would expect based on the

bootstrap results; they are both skewed and peaked. As a consequence the

resulting confidence intervals for particularly H0 and refck are not symmetric

(Table 5.2). In general, bootstrap analyses can contribute to improving the

characterisation of model parameter variation, but at the same time one should

not ignore the information available in fruit-to-fruit related variation.

Hp

lus

H0

kc

kc H0 Hplus

-5 0 50

0.05

0.1

0.15

0.2

0.25

-2 0 2 4-2

-1

0

1

2

3

-2 0 2 4-4

-2

0

2

4

-2 0 2 4-2

0

2

4

-5 0 50

0.05

0.1

0.15

0.2

0.25

-2 0 2 4-4

-2

0

2

4

-4 -2 0 2 4-2

0

2

4

-4 -2 0 2 4-2

-1

0

1

2

3

-5 0 50

0.05

0.1

0.15

0.2

0.25

Fig. 5.8 Blue data: Covariance structure and frequency distributions for the 120

parameter sets based on the individual colour change data of 120 ‘Tradiro’

tomatoes stored for 10 d at 18 °C using the model from Eq. 4.2. Red data: 150

newly generated random parameter sets based on the individual fruit parameters

(blue data).


Using the outlined technique from section 5.3.2, random correlated parameter

sets were generated for the case of the tomato colour model, using the bootstrap

samples from Fig. 5.6 and Fig. 5.8 as a starting point. The first example shows

the results of 1000 sets of correlated random parameter generated from the

bootstrap parameter sets that were based on the unstructured colour change data

of 120 ‘Tradiro’ tomatoes stored for 10 d at 18 °C. The randomly generated data

give good agreement with the original bootstrap data (Fig. 5.9). Also when

checking the statistical parameters like variances, covariances, means, skewness

and kurtosis, the generated samples nicely match the original data (data not

shown). From a theoretical point of view this is not unexpected as the covariance

decomposition algorithm is a proven technique to generate random co-varying

Gaussian random parameters. So, as long as there is an accurate fit of the SKN

distribution on the bootstrap data the randomly generated parameter sets should

be correct.

Fig. 5.9 Covariance structure and frequency distributions of the original

bootstrap data (Fig. 5.6, blue data) and of 1000 newly generated random

parameter sets (red data). The model parameters are based on the unstructured

colour change data of 120 ‘Tradiro’ tomatoes stored for 10 d at 18 °C using the

model from Eq. 4.2.


The second example shows the results of 150 sets of correlated random

parameter that were generated from the 120 parameter sets based on the

individual colour change data of 120 ‘Tradiro’ tomatoes stored for 10 d at 18 °C.

Again, the randomly generated parameter sets showed good agreement with the

original bootstrap data (Fig. 5.8).

0.00260

0.00265

0.00270

0.00275

0.00280

0.00285

0.00290

0.00295

0.00300

m

kc

56

57

58

59

60

61

62

63

64

65 H0

39.5

39.6

39.7

39.8

39.9

40.0

40.1

40.2

40.3

40.4

40.5 H+

0.00030

0.00035

0.00040

0.00045

0.00050

0.00055

0.00060

0.00065

0.00070

s

8

9

10

11

12

13

14

15

16

17

0.80

0.85

0.90

0.95

1.00

1.05

1.10

1.15

1.20

1.25

1.30

1.35

1.40

1.45

-0.30

-0.15

0.00

0.15

0.30

0.45

0.60

0.75

0.90

1.05

1.20

1.35

1.50

1.65

g1

0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00

2.25

2.50

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

g2

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

-2

-1

0

1

2

3

4

5

Fig. 5.10 Mean (µ) standard deviation (ů), kurtosis (ɔ1) and skewness (ɔ2) and

their 95 % confidence intervals of 50 data sets, each containing 150 sets of

randomly generated model parameters. The model parameters are based on the

individual fruit colour change data of 120 ‘Tradiro’ tomatoes stored for 10 d at

18 °C using the model from Eq. 4.2.The first bar emphasised by the dashed box

indicate the data from the original bootstrap data.

By generating these random sets from Fig. 5.8 over and over again and

looking at the realised means, standard deviations, skewness and kurtosis of the

three model parameters a slight but consistent discrepancy between the original

bootstrap data and the subsequent random parameter sets was observed (Fig.

5.10). Especially in the case of H0 and ¤+H the standard deviation, kurtosis and

skewness tended to higher values in the resampled parameters as compared to the

original data. The differences were most of the time not significant but too

systematic to be ignored. The reason for this systematic deviation is the small

number of parameters in the original bootstrap set resulting in a marginal fit of

the SKN distribution to the original bootstrap data leaving relative large residuals


(Fig. 5.8). For refck this fit was better (Fig. 5.8) resulting in more accurate

resampling results. This emphasises the importance of a good fit of the SKN

distribution to the original bootstrap data to enable reliable resampling of the

model parameters for the subsequent Mont Carlo simulations.

The randomly generated model parameters based on the individual fruit data

can subsequently be used to perform Monte-Carlo simulations to get a good idea

of the behaviour of a random batch of fruit through different postharvest

temperature regimes. Based on a large random sample of 1000 model parameter

sets 6 different temperature regimes were simulated to study the effect of

temperature using the energy of activation from Table 4.1 as estimated on the

original individual fruit data. At 12 °C (Fig. 5.11A) hue colour decreased slowly

as compared to storage at 18 °C (Fig. 5.11F) maintaining large levels of variation

throughout the storage period.

When a period of 2 d at 18 °C is introduced (Fig. 5.11B, C and D) hue colour

decreases during this period at the same time reducing the level of variation. By

the end of the storage period, regardless the timing of the warm period,

comparable levels of variation are reached. In the case of a constantly fluctuating

temperature (Fig. 5.11E) an oscillating decrease in hue and the related level of

variation is obtained.

Based on these results the following contradicting conclusions can be reached

for the case of tomato colour change. From the point of view of prolonging shelf

life, 12 °C storage gives the best results slowing down ripening. From the point of

view of delivering a homogeneous batch of fruit to the market, higher

temperatures are preferable as they result in less variation. As long as no rots or

excessive weight loss occurs, 4 d to 6 d storage at 18 °C would give the best

result in terms of producing a homogeneous batch of red coloured ready-to-eat

fruit although it would result in a shorter shelf life.

The Monte-Carlo distribution clearly illustrates how temperature is affecting

the distribution of hue colour. As such, the colour distribution at any time during

postharvest contains information on the incurred temperatures and thus can be

used to judge temperature control throughout a logistic chain.


0.250 0.144 0.083 0.048 0.027 0.016 0.009 0.005 0.003 0.002

Fig. 5.11 Density plot of the result of 1000 Monte-Carlo simulations simulating

for 6 different postharvest temperature regimes (A-F). The colours represent the

ratio of fruit with a particular hue colour at a particular time. A: constant 12 °C;

B: 12 °C, except for 18 °C from t = 0 d to t = 2 d; C: 12 °C, except for 18 °C from

t = 2 d to t = 4 d; D: 12 °C, except for 18 °C from t = 4 d to t = 6 d; E:

temperature varying with two day intervals between 12 °C and 18 °C; F: constant

18 °C. The Monte-Carlo model parameters were generated based on the

individual fruit colour change data of 120 ‘Tradiro’ tomatoes stored for 10 d at

18 °C.


If one closely examines the Monte-Carlo simulation results from Fig. 5.11,

one can also notice that, once a batch has reached a certain averaged hue colour,

the distribution at that given moment is always the same. This is illustrated by

Fig. 5.12 showing the frequency distributions for the batches from the different

postharvest temperature regimes in Fig. 5.11 when reaching an average hue

colour of 47°. The frequency distributions for the 6 batches are identical given the

averaged batch colour of 47° (Fig. 5.12) indicating that the shape of this

frequency distribution is independent of the prior temperature history. In other

words: regardless the trajectory along which the tomatoes reached an average

colour of 47°, the corresponding distribution remains the same.

40 50 60 70 800.00

0.02

0.04

0.06

0.08

0.10

v(H

)

hue (°)

Fig. 5.12 Relative frequency plots of hue colour. Data for the 6 curves was taken

from the 6 Monte-Carlo simulations from Fig. 5.11. For each of the Monte-Carlo

simulations the moment was selected at which the average batch reached a hue

value of 47°. The exact time for this depended on the postharvest temperature

regimes applied (Fig. 5.11; A: t = 10 d; B: t = 2.8 d; C: t = 3.8 d; D: t = 5.2 d; E:

t = 4.3 d; F: t = 1.7 d). For each of these times, the relative frequency

distributions for hue colour were reconstructed.

5.4 Conclusions

The aim of modelling in postharvest biology and technology is to develop valid

models that are consistent with our current knowledge and are free of flaws of

logic. To upgrade a conceptual model into a mathematical model the

mathematical equivalents of the concepts have to be formulated and last but not


least, values have to be assigned to the model parameters. In some cases

parameters are well defined physical constants or measurable quantities.

However, in many cases the model parameters represent properties of lumped

systems that can not be measured directly and need to be estimated based on

experimental data. For this purpose OptiPa, a dedicated optimisation tool, was

developed.

Special attention was paid to its ease of use to classify the model parameters

into generic, cultivar specific, batch specific and fruit variable parameters and

estimate them accordingly. By classifying model parameters into for instance

batch and cultivar specific parameters, the feasibility of turning a descriptive

model into a predictive model becomes within reach as was illustrated for the

potato sweetening model. Model parameters identified as cultivar specific can

readily be conserved and transferred from one situation to the other as long as the

same cultivar is involved. For parameters identified as batch specific, focused

measurements can be taken to determine these parameters for new batches to

allow predictive application of the model to describe the behaviour of future

batches of otherwise known cultivars. So, the analysis step of classifying model

parameters is an essential step in developing predictive models and is facilitated

by OptiPa.

Working in postharvest biology and technology involves coping with the

omnipresent biological variation. Therefore numerical tools were implemented to

generate accurate estimates of the confidence intervals of the model parameters

(bootstrap techniques) and to predict the propagation of parameter variation

(Monte Carlo simulations). For this purpose a new generic probability

distribution function was developed to allow the generation of random correlated

model parameters with non-Gaussian distributions allowing both skewness and

kurtosis.

model calibration and uncertainty analysisu0040603/thesis/ch5_model...quality change modelling in...

Documents