the genetic programming and - pdfs.semanticscholar.org programming and its application in real-time...
TRANSCRIPT
Genetic Programming and
its Application in Real-time Runoff Forecasting
Khu, S. T., Liong, S. Y. , Babovic, V., Madsen, H. and Muttil, N.
ABSTRACT
Genetic programming (GP), a relatively new evolutionary technique, is demonstrated
in this study to evolve codes for the solution of problems. First, a simple example, in
the area of symbolic regression, is considered. GP is then applied to real-time runoff
forecasting for the Orgeval catchment in France. In this study, GP functions as an
error updating scheme to complement a rainfall-runoff model, NAM. Hourly runoff
forecasts of different updating intervals are performed for forecast horizons of up to
nine hours. The results showed that the proposed updating scheme is able to predict
the runoff quite accurately for all updating intervals considered and particularly for
updating intervals of not exceeding the time of concentration of the catchment. The
results are also compared with those of an earlier study, by the World Meteorological
Organization (WMO, 1992), at which auto-regression and Kalman filter were used as
the updating methods. Comparisons show that GP is a better updating tool for real-
time flow forecasting. Another important finding from this study is that non-
dimensionalizing the variables enhances the symbolic regression process significantly.
Key works: genetic programming, evolutionary algorithms, rainfall-runoff, real-time
forecasting, updating, regression.
1
Genetic Programming and Its Application in Real-time Runoff ForecastingKhu, S. T., Liong, S. Y. , Babovic, V., Madsen, H. and Muttil, N.Journal of American Water Resources Assoc., (JAWRA) Vol. 37, No. 2, 2001.
This is a Pre-Published Version.Published version available at Wiley-Blackwell: http://doi.org/10.1111/j.1752-1688.2001.tb00980.x
Genetic Programming and
Its Application in Real-time Runoff Forecasting
INTRODUCTION
One of the central challenges of computer science is to get a computer to
perform a task without telling it how to do it. In hydrologic engineering, the challenge
is to derive a model that relates two or more physical processes without knowing the
actual mechanics of conversion. Genetic Programming (GP) addresses the first
challenge by providing a method which automatically creates a working computer
program from a high-level statement of the problem. GP achieves this automatic
program discovery (also known as program synthesis or program induction) by
genetically breeding a population of computer programs using principles of
Darwinian natural selection and biologically inspired operations.
GP can also be applied to infer models in hydrologic engineering problems
such as rainfall-runoff modelling or runoff forecasting. In problems where complete
understanding of the physical process is lacking or the process is too complicated to
be modeled, GP may offer some assistance or insight. An application area of GP is
that of real-time runoff forecasting. In real-time runoff forecasting for example,
incorporating knowledge of prediction errors of the past forecast to forecasting
models of different horizons is can greatly improve the models’ performance.
In runoff forecasting, information on the immediate past and current states of
meteorological conditions and those of the catchment are essential to forecast the
catchment’s response for different forecast horizons. When applied in a real-time
mode, it is necessary to modify or update the forecast based on current information
such as observed discharges. There are four updating approaches that update either (1)
2
the input parameters, (2) the state variables, (3) the model parameters, or (4) the
output variables. The most commonly used scheme is updating the output variables or
error correction. This approach is adopted in this study.
This study is organized as follows: genetic programming is first introduced
followed by a detailed examination of the various biologically inspired operations to
form new solutions. A simple example is then used to illustrated the strength of GP.
Finally GP is used to improve the simulated runoff of a rainfall runoff model applied
to an actual catchment.
GENETIC PROGRAMMING
Genetic Programming (GP) is a relatively new domain-independent method
for evolving computer programs to solve, or approximately solve, problems (Koza,
1992). In engineering applications, GP is frequently applied to model structure
identification problems. In such applications, GP is used to infer the underlying
structure of either a natural or experimental process in order to model the process
numerically. GP inferred models have the advantages of (1) generating simple
parsimonious expressions and (2) offering some possible interpretations to the
underlying process.
GP began as an attempt to discover how computers could learn to solve
problems without being explicitly programmed to do so. One successful application of
GP in automatic program discovery is that of symbolic regression, instead of the
traditional numerical regression. This makes the application of GP even more relevant
in fields where large amounts of data are accumulating in machine-readable form. For
example, GP has been applied to (1) predict chaotic financial time series (Oakley and
Howard, 1994); (2) predict occurrence of sunspots (Lee and Suzuki, 1995); (3) solve
3
various hydraulics problems, such as rainfall-runoff relationship from synthetic data,
sediment transport modelling, salt intrusion in estuaries and flow over a flexible
vegetated bed (Babovic and Abbott, 1997); and (4) emulate the rainfall-runoff process
(Whigham and Crapper, 1999).
GP belongs to a class of probabilistic search procedures known as
Evolutionary Algorithms (EAs) which includes Genetic Algorithms (GA) (Holland,
1975), Evolutionary Programming (EP) (Fogel et al., 1966) and Evolutionary Strategy
(ES) (Schwefel, 1981). These techniques use computational models of natural
evolutionary process for the development of computer based problem-solving
systems. All evolutionary algorithms function by simulating the evolution of
individual structures via processes of reproductive variation and fitness based
selection. The techniques have become extremely popular due to their success at
searching complex non-linear spaces and their robustness in practical applications.
Basic Principles of Genetic Symbolic Regression
Genetic Symbolic Regression (GSR) is a special application of GP in the field
of symbolic regression. In traditional numerical regression, one pre-determines the
functional form, either linear or higher order, and the task is to determine the
coefficients. In symbolic regression, the task is to both find a suitable functional form
and determine the coefficients. Hence, GSR involves finding a mathematical
expression, in symbolic form, relating a finite sampling of values of a set of
independent variables (xi) and a set of dependent variables (yj).
GSR can be viewed as an extension of Genetic Algorithm (GA) in terms of the
basic principles of operations. Like GA, GSR works with a number of solution sets,
known collectively as a population, rather than a single solution at any one time. With
4
a large number of solution sets, it gives both techniques the advantage of avoiding the
possibility of getting trapped in the local optima. There are, however, two major
differences between GP and GA. They are:
(i) GSR works with two sets of variables, instead of one set of variables as in GA.
One set of variables, known as the terminal set, contains the independent
variables and constants, {xi}, similar to GA. The other set, known as the
functional set, contains the basic operators used to form the function, f( ). For
example, the function set may contain the following operators { +, -, *, /, ^,
log, sin, tanh, exp, ….} depending on the perceived degree of complexity of
the regression. Thus, the symbolic regression is performed using these two
variable sets and it is possible to derive a large number of possible functional
relationships to fit the data.
(ii) In most EAs, the length of the solution set is normally fixed. In GP, however,
the length is allowed to vary from one solution set to another. This variation in
length is due to the two genetic operators, crossover and mutation. The
flexibility in the structure length increases the search space significantly.
The solution sets in each iteration are collectively known as a generation. In GPs, the
size of a population does not have to be the same from one generation to the next. The
solutions of the very first generation are usually generated through a random process.
However, those of the subsequent generations are generated through genetic
operations. Each possible solution set can be represented and visualized in either parse
tree form or in Polish notation (Lukasiewicz, 1957) as shown in Fig. 1. As the
population evolves, new solution sets replace the older ones and are supposed to
perform better. The solution sets in a population associated with the best fit
5
individuals will, on average, be reproduced more often than the less-fit solution sets.
This is known as the Darwinian principle of the "survival of the fittest".
The basic procedure of GP, Fig. 2, can be described as follows:
1. generate the set of initial population;
2. evaluate each parse tree and assign the fitness;
3. form the temporary population by selecting candidates according to their fitness.
This temporary population is called the mating pool. Candidates with higher
fitness are given greater probabilities to mate and hence, to produce children or
offspring;
4. choose pairs of parse tree from the temporary mating pool randomly for mating
and apply the genetic operator called crossover. Crossover is the exchange of
genetic material (such as fitness, composition) between two selected candidates;
5. select a crossover site where the material will be exchanged randomly, thereby
resulting in the creation of offspring;
6. apply another genetic operator known as mutation which randomly changes the
genetic information of the candidate;
7. copy the resultant chromosomes into the new population;
8. evaluate the performance of the new population;
9. repeat steps 3-8 until a predetermined criterion is reached.
Selection, Crossover and Mutation
Selection is the process of altering the fitness of the individual with respect to
the whole generation's average performance. This is an important step because it
determines directly the individual's chances of its representation in the next
generation. A common selection method is fitness ranking. Individuals are sorted
6
according to the fitness values and ranked accordingly. Another selection method is
the tournament selection. This method fills the mating pool without the need of fitness
mapping. Pairs of individuals are picked at random from the population. The
individual with a higher fitness value is copied into the mating pool directly. This is
repeated until the pool is filled.
Crossover is the first process of producing new individuals from the selected
individuals in the mating pool. It takes two individuals and prunes their branch at
some randomly chosen position, into two segments (Figure 3a). Exchanging the
segments produces two new possible solution sets (Figure 3b). The two new solution
sets or children inherit some characteristics from their parents and genetic information
is thereby exchanged in the process. As the process continues, the fitness of the whole
population increases and converges to finding the near optimal solution set.
Mutation is the random alteration of the individual parse tree at the branch or
node level (Figure 4). Mutation is a mechanism that perturbs the parse tree structure.
It does not usually change the tree structure but the information content in the parse
tree. It therefore explores a new domain and serves to free the search from the
possibility of being trapped in local optima. It should be noted that mutation can be
destructive, causing rapid degradation of relatively "fit" solution sets if the probability
of mutation is set too high. Depending on the strategy adopted, mutation rate can be
low according GA-type mutation or high according to ES-type mutation.
The genetic programming introduced here is one of the simplest forms
available. Instead of Polish notations, GP solution sets can also be represented in other
forms such as: direct acylic graphs (Handley, 1994), linear representation (Perkis,
1994) and direct graphs (Poli, 1996). The evolution processes can also vary such as
using automatically defined functions (Koza, 1994), and adaptive representation
7
through learning (Rosca and Ballard, 1996), etc. These details can be found, for
example, in Koza (1992, 1994), Kinnear (1994), Angeline and Kinnear (1996) and
Langdon (1998).
AN EXAMPLE OF GENETIC SYMBOLIC REGRESSION
An example is shown here to illustrate the concept of GSR. The problem of
interest is to infer the Bernoulli equation for a steady, 1-dimensional fluid flow:
constg
vpzE =++=
2
2
γ(1)
where z = vertical distance above a datum (m)p = pressure (N/m2)v = velocity (m/s)g = Earth’s gravitational acceleration (9.81 m/s2)γ = specific gravity of water (9810 N/m3)
From Eq.(1), 1000 sets of different combinations of z, p and v are generated using a
standard random number generator. The values of the energy head, E, are then
computed correspondingly. It should be noted that this example was used in Maarten
and Babovic (1999) where problems involving variables of different dimensions are
discussed. For example, if variable z were to be multiplied with variable v, then the
resulting expression has a dimension of square-of-length/time which is inconsistent
with the dimension of variable E (length). In their paper, such occurrences of
dimensional inconsistency are penalized, to ensure dimensional consistency of the
resulting GP expression.
In this study, however, to circumvent the problem of dimension inconsistency,
each of the values of z, p and v is normalized or non-dimensionalized by using each
variable’s maximum value. By doing so, Eq.(1) is then transformed to a new form:
constg
vvppzzEE =++=
2
2max
2max
maxmax
γ(2)
8
or
constvCpCzCE =++= 2321
(3)
where the non-dimensionalized variables are given by:
maxE
EE =
(4a)
maxz
zz =
(4b)
maxp
pp =
(4c)
maxv
vv =
(4d)
and the coefficients are:
max
max1 E
zC = (5a)
max
max2 .E
pC
γ= (5b)
max
2max
3 .2 Eg
vC = (5c)
The terminal set used is the set of non-dimensionalized variables { }vpz
,,
while the functional set is {+, -, /, *}. Crossover is performed by randomly choosing
the sub-tree insertion location. The objective function is to find the minimum of the
root mean square error (RMSE) of the predicted energy head. The other GP relevant
parameters and their values are shown in Table 1. The genetic programming software
used in this study is GPkernel developed at the Danish Hydraulic Institute. The initial
population is generated by a random number generator. The size of the tree of the
initial population is constrained to a maximum of fifteen levels and the subsequent
tree size is constrained to forty-five levels. This restriction is necessary since GP has
the tendency to infer a Fourier expansion-type function if the tree size is not limited.
9
This type of expansion, although it may fit the data very well, does not add value in
the function interpretation.
Ten different runs are performed, each using a different seed, to generate the
random numbers. Each time GPkernel is run for 15 minutes. Fig. 5 shows the average
root mean square error (RMSE) of each generation in the GSR runs. It should be
noted that the average RMSE, for all 10 runs, decreases rapidly as the generation
progresses and reached values near to zero for most of the runs. As a result, exact
formulae (up to 3 significant figures) are produced from each run.
The above simple example illustrates the capability of GP, or GSR technique
to infer the correct functional relationship when there are no errors in the raw data.
The next section illustrates the use of GSR as a new updating procedure combined
with rainfall runoff simulation models.
GP APPLICATION IN REAL-TIME RUNOFF FORECASTING
A forecasting system is a system that takes information on the past and current
states of meteorological conditions and those of the catchment, as inputs to it, and
forecasts the catchment’s response into the future. In real-time forecasting, however,
the originally forecast values may be updated or modified as measured data become
available and, thus, prediction errors can be determined and used for forecasting. In
real-time runoff forecasting with rainfall runoff simulation models, rainfall time series
up to the desired runoff forecast horizon must be available. The required rainfall time
series within the runoff forecast horizon may be estimated with, for example, a non-
linear prediction method. In this study, the measured rainfall time series, at any runoff
forecast horizon, is made available to evaluate the performance of the proposed GSR
based error updating scheme.
10
The focus of this study is to: (1) compare the forecasts of a calibrated rainfall-
runoff model, e.g. NAM, with and without the GSR based error updating scheme; and
(2) suggest how far in the future, i.e. the maximum forecast horizon, the GSR based
error updating scheme can be used with confidence.
The catchment simulation model used in this study is a lumped rainfall-runoff
model called Nedbor-Afstomnings-Model (NAM) which is presently part of the
MIKE11 software. The NAM model has been developed by the Hydrological Section
of the Institute of Hydrodynamics and Hydraulic Engineering at the Technical
University of Denmark (Nielsen and Hansen, 1973). The model can be defined as a
deterministic, conceptual, lumped type model with moderate input data requirement.
The NAM model consists of a set of linked mathematical statements describing, in a
simplified quantitative form , the behaviour of the land phase of the hydrological
cycle. The input data requirements are the catchment size, precipitation, potential
evapo-transpiration and temperature. It operates by continuously accounting for the
moisture content in the snow zone, surface zone, sub-surface zone and groundwater
zone storage. The model can be calibrated using historical data by adjusting one or
more of the seventeen parameters.
Figure 6 shows the schematic diagram of the proposed error updating using
GSR. The NAM model is first used to simulate the discharge, QSIM, for the whole
period of interest based on the rainfall data, R. The proposed procedure is then used
to compute the prediction error, ε, by comparing the simulated discharge, QSIM, with
the observed discharge, QOBS, for time, t. The new simulated or improved discharge,
QIMPt, is computed by adjusting QSIMt for each forecast lead-time within the forecast
horizon.
11
Mathematically, the measured discharge, QOBSt, at time t, can be expressed
as:
QOBSt = QSIMt + εt (6a)
or
εt = QOBSt - QSIMt (6b)
GSR is used to infer the functional relationship, F( ), between the simulated
discharges and the past simulation errors, and the present simulation error. For lead
time of 1 hour, the functional relationship for GSR prediction error, 1ˆ +tε , may be
expressed as follows:
{ }41411 ....,,....,,ˆ −−−++ = ttttttt QSIMQSIMQSIMF εεεε (7a)
and the forecast improved discharge, QIMPt+1, can be obtained from:
111 ˆ +++ += ttt QSIMQIMP ε (7b)
For lead time of 2, 3,…, α hours, the recursive form of Eq. (7) can be written as:
{ }414 ˆ,....,ˆ,,....,ˆ −+−+−+++ = ααααα εεε ttttt QSIMQSIMF (8a)
ααα ε +++ += ttt QSIMQIMP ˆ (8b)
QSIM and ε of the immediate past 5 time steps are included in the functional set since
the catchment’s time of concentration varies up to a maximum of 5 hours, i.e. 5 time
steps (WMO, 1992).
It should be noted that the values of αε +tˆ in Eq. (8a) may be either the actual
errors at instances when measured data are available or GSR derived errors.
The real-time flood forecasting with updating procedure for a 1-hour lead-time could
be summarized as follows:
12
1. The NAM model is first calibrated using an automatic calibration routine, e.g.
Accelerated Convergence Genetic Algorithm, ACGA (Liong et al., 1998), on the
entire runoff period from 1972-1974;
2. The prediction errors, denoted by ε, between the NAM simulated and observed
runoff for each time interval, are computed;
3. Ten storm events, representing high flow regimes with minimum discharge of 4
m3/s, are selected from the calibration runoff period, for the symbolic regression
using GP. This minimum discharge criterion is also used in the selection of the
verification data set;
4. GSR is then used to derive the functional relationship between the present
prediction error, ε̂ , and the NAM simulated discharge, QSIM, and the past
prediction errors, ε, as given in Eq. (7a);
5. The improved simulated discharge, QIMP, is finally calculated, using Eq. (7b) and
compared with the measured discharge.
For α > 1, the above procedure is repeated following Eqs. (8a) and (8b).
In this study, the catchment under consideration is the Orgeval catchment, in France
(Fig. 7), which has been studied extensively in the World Meteorological
Organization’s inter-comparison project (WMO, 1992). The catchment is located
about 80 km east of Paris and the main river that drains the catchment runoff is the
Orgeval. The catchment has an area of about 104 km2. The catchment comprises
mainly rural area, with only 1 percent of the total area urban and 18 percent of the
total area forest.
In this study, a total of ten storm events (denoted as storms S1 – S10) from a
1972 – 1974 hourly flow record was selected for training the GSR (Fig. 8(a)) while a
total of six storm events (denoted as storms S11 – S16) between 1979 –1980 (Fig. 8
13
(b)) was selected and used for the verification of the updating procedure. Fig. 9(a) and
9(b) show the observed and NAM simulated hydrographs for two of the selected GSR
training events. These ten storm events represent high flow regimes which are the
focus of GSR training in this study. It should be noted that the maximum peak
discharge of the GSR training storms is 7.38 m3/s while those of the verification storm
events range from 10 m3/s to 29 m3/s.
Following the example in the previous section, both the dependent
variable, ε̂ , and the independent variables, QSIM and ε, are non-
dimensionalized using their respective maximum values. Hence, the terminal set
in GSR for the 1-hr lead time, for example, is given by the normalized values
{ }541 ...,,,...,,, −−− ttttt QSIMQSIMQSIM εε and the functional set is given
by the basic algebraic operators {+, -, *, / }. Henceforth, all variables used in the
study are normalized variables and the bar sign on each variable is therefore
suspended. The objective function searches the minimum of the root mean square
error (RMSE) of the predicted error, ε̂ . In this study, the GP program, GPKernel,
was run for 30 minutes on a Pentium II 300 PC. The other GP relevant parameters and
their values are shown in Table 2. The population size of the parent and children are
both set at 3000. It is to be noted that, from various runs attempted, it was difficult to
achieve good prediction accuracy with a smaller population size.
Discussion and Analysis of Results
The best functional form, with the minimum RMSE, resulting from GSR is as
follows:
( )12
11
087.0
644.0611.1009.0ˆ
−−
−+
−+−+=
ttt
ttt
QSIMQSIMεεεε
(9a)
14
or
( )12
111
087.0
644.0611.1009.0
−−
−++
−+−++=
ttt
tttt
QSIMQSIM
QSIMQIMP
εεε
(9b)
Eq. (9a) shows a certain degree of similarity with the commonly used auto-regressive
form with the exception of the fourth term, an interaction term between the simulated
discharges and a prediction error. This interaction term is significant in rectifying
under or over-prediction trends of the simulation model. Eq. (9) also shows that only
simulated discharges and/ or prediction errors of up to three previous time-steps are
important. This implies that, for data used to derive the above functional relationship,
the catchment’s time of concentration is more likely about three hours.
Table 3 shows the root mean square errors (RMSE) of the various prediction
horizons for each of the verification storm events. From this table, it can be seen that
the RMSE of each event is relatively better or of the same order of magnitude as that
of the simulation model (NAM). Up to 4-hour lead time for all storm events
considered, except for storm event S11, the RMSE values of NAM+GSR are
categorically better than those resulting from NAM only. Thus, the proposed updating
GSR can be used up to a lead time of 4 hours with high confidence. Figs. 10 and 11
showed the performance of the proposed procedure with different updating
frequencies of 2 hours, 4 hours and 6 hours for 2 verification storm events. It shows
clearly that the performance of the GSR error updating method is acceptable for all
the updating frequencies.
The World Meteorological Organization (WMO) conducted a workshop in
1988 and published a report entitled “Simulated Real-Time Inter-Comparison of
Hydrological Models” (WMO, 1992). The WMO study compared the performances
of 14 different updating procedures. The study found that the error updating procedure
NAMS11 (Havno et al., 1995) and the state updating procedure NAMKAL (Storm et
15
al., 1988) yielded best performance on the French Orgeval catchment. Briefly,
NAMS11 applies: (1) NAM rainfall-runoff simulation model and the MIKE11
hydrodynamic module; and (2) an error correction technique based on a first order
autoregressive model. NAMKAL is a modified NAM model, reformulated in state
space form and updated with an extended Kalman filtering algorithm.
In the WMO study, forecasts were updated at every 4th-hour. Thus, their
results are now compared with this study’s proposed GSR based updating scheme of
4-hour runoff forecast horizon. The choice of the updating interval coincides with the
earlier drawn conclusion from Table 3. Figure 12(a) and (b) shows the average
RMSE, of two verification storm events (storms S12 and S15) resulting from various
updating schemes, NAM+GSR, NAMS11 and NAMKAL. These two storm events
are the same as those chosen in Figs. 10 and 11. Fig. 12(a) shows that for storm event
S12, the proposed NAM+GSR performs better in the first 3 hours than NAMS11 and
NAMKAL while on the fourth hour, they all perform equally. Fig. 12(b), however,
shows clearly that for storm event S15, the NAM+GSR is categorically better than the
two other techniques.
16
CONCLUSIONS
A relatively new evolutionary technique, known as genetic programming (GP)
has been introduced. GP was used to evolve codes for the solution of problems. A
simple example of the Bernoulli equation was used to illustrate how GP symbolically
regresses or infers the relationship between the input and output variables. An
important conclusion from this study is that non-dimensionalizing the variables prior
to symbolic regression process significantly enhance the success of GSR.
GP was then applied to the problem of real-time runoff forecasting for the
Orgeval catchment in France. GP functioned as an error updating procedure
complementing the rainfall-runoff model, NAM. Ten storm events were used to infer
the relationship between the NAM simulated runoff and the corresponding prediction
error. That relationship was subsequently used for real-time forecasting of six storm
events.
The results indicate that the proposed methodology is able to forecast different
storm events with great accuracy for different updating intervals. The forecast
hydrograph performs well even for a long forecast horizon of up to nine hours.
However, for practical applications in real-time runoff forecasting, the updating
interval should be less than or equal to the time of concentration of the catchment.
The results were also compared with two known updating methods such as the auto-
regression and Kalman filter. Comparisons show that the proposed scheme,
NAM+GSR, is comparable to these methods for real-time runoff forecasting.
17
ACKNOWLEDGMENTS
The work is sponsored jointly by the National University of Singapore
research project, RP3972705 and the Talent Project No 9800463 “Data to Knowledge
- D2K” funded by the Danish Technical Research Council (STVF). Part of the work
was done by the first author during his study leave at Danish Hydraulic Institute
(DHI).
18
LITERATURE CITED:
Angeline, P. J. and Kinnear, K. E., (1996). Advances in Genetic Programming 2. MIT
Press, Cambridge, MA.
Babovic, V. and Abbott, M. B., (1997). The evolution of equations from hydraulic
data, part II: applications. Journal of Hydraulic Research, 35(3): 411-430.
Fogel, L. J., Owens, A. J. and Walsh, M. J., (1966). Artificial Intelligence through
Simulated Evolution. John Wiley, New York.
Handley, S., (1994). On the use of a directed acyclic graph to represent a population
of computer programs. In Proceedings of the 1994 IEEE World Congress on
Computational Intelligence. IEEE Press, pp. 154-159.
Holland, J. H., (1975). Adaptation in Natural and Artificial Systems, University of
Michigan Press, Ann Arbor, 1975.
Kinnear, K. E., (1994). Advances in Genetic Programming. The MIT Press,
Cambridge, MA.
Koza, J. R., (1992). Genetic Programming: On the Programming of Computers by
Means of Natural Selection. The MIT Press, Cambridge, MA.
Koza, J. R., (1994). Genetic Programming 2: Automatic Discovery of Reusable
Programs. The MIT Press, Cambridge, MA.
Langdon, W. B., (1998). Genetic Programming and Data Structures. Kluwer
Academic Publishers, Norwell, MA.
Lee, G. Y. and Suzuki, A., (1995). Genetic programming approach for time series
analysis and prediction. Journal of Graduate School and Faculty of Engineering,
University of Tokyo (B), 43(2): 201-220.
Liong, S. Y., Khu, S. T. and Chan, W. T., (1998). Derivation of Pareto front with
accelerated convergence genetic algorithm, ACGA. In Proceedings of the 3rd
19
International Conference on Hydroinformatics, Babovic, V. and Larsen, L. C.
(eds.), Volume 2, pp. 889 - 897.
Lukasiewicz, J., (1957). Aristotle's Syllogistic from the Standpoint of Modern Formal
Logic. Oxford , Clarendon Press.
Maarten, K. and Babovic, V., (1999). Dimensionally Aware Genetic Programming.
Proceedings of the Genetic and Evolutionary Computation Conference,
GECCO-99, Daida, W. B., Eiben, A. E., Garzon, M. H., Honavar, V., Jakiela,
M. and Smith, R. E. (eds.), Morgan Kaufmann, CA.
Nielsen, S. A. and Hansen, E., (1973). Numerical simulation of rainfall runoff process
on a daily basis. Nordic Hydrology, 4: 171-190.
Oakley, N. and Howard, E., (1994). The application of genetic programming to the
investigation of short, noisy, chaotic data series. In Evolutionary Programming,
Lecture Notes in Computer Sciences, Fogarty, T C. (ed.), No. 865, Springer-
Verlag, Germany, pp. 320-332.
Perkis, T., (1994). Stack-based genetic programming. In Proceedings of the 1994
IEEE World Congress on Computational Intelligence, Vol. 1, IEEE Press, pp.
148-153.
Poli, R., (1996). Parallel distributed genetic programming. Technical Report CSRP-
96-15, School of Computer Science, University of Birmingham.
Rosca, J. P. and Ballard, D. H., (1996). Discovery of subroutines in genetic
programming. In Advances in Genetic Programming 2, Angeline, P. J. and
Kinnear, K. E. (eds.), The MIT Press, Cambridge, MA., pp. 177-202.
Rungo, M., Refgaard, J. C. and Havno, K., (1991). The updating procedure in the
MIKE11 modelling systme for real-time forecasting. Proceeding of the
20
International Symposium on Hydrological Applications on Weather Radar, Ellis
Horword publication, University of Salford, 14-17 August 1989, pp. 497-508.
Schwefel, H. P., (1981). Numerical Optimization of Computer Models. John Wiley,
Chichester.
Storm, B., Jensen, K. H. and Refgaard, J. C., (1988). Estimation of catchment rainfall
uncertainty and its influence on runoff prediction. Nordic Hydrology, Vol. 19,
pp. 77-88.
Whigham, P. A. and Crapper, P. F., (1999). Modelling Rainfall-runoff Relationships
using Genetic Programming. Special Issue of Journal of Mathematical and
Computer Modelling (in press).
WMO, (1992). Simulated Real-Time Inter-Comparison of Hydrological Models.
WMO Operational Hydrology Report no. 38, WMO no. 779. World
Meteorological Organization, Geneva.
21
List of Tables:
Table 1 : Genetic Programming Parameters Used in Bernoulli Equation
ExampleTable 2 : Genetic Programming Parameters Used in Real-time Runoff
Forecasting ExampleTable 3 : Root Mean Square Error of Testing Storms for Different Prediction
Lead-times
22
Parameter ValueSize of parent 1000Size of children 1000Tournament size 3Crossover rate 1.0Mutation rate 0.3Maximum initial tree size 15Maximum tree size 45
23
Parameter ValueSize of parent 3000Size of children 3000Tournament size 3Crossover rate 1.0Mutation rate 0.3Maximum initial tree size 15Maximum tree size 30
24
average RMSE
Lead-time(hours)
Storm Storm Storm Storm Storm Storm
1
(NAM)
0.248
(0.878)
0.163
(2.675)
0.188
(1.810)
0.679
(4.515)
0.635
(3.199)
0.596
(1.657)
2 0.561 0.409 0.377 1.624 1.190 0.960
3 0.847 0.679 0.607 2.609 1.546 1.274
4 1.046 0.943 0.838 3.557 1.749 1.554
5 1.116 1.190 1.072 4.373 1.798 1.746
6 1.075 1.415 1.270 4.991 1.821 1.874
7 1.024 1.617 1.440 5.274 1.971 1.903
8 0.994 1.789 1.573 5.331 2.251 1.805
9 0.991 1.923 1.686 5.641 2.504 1.656
25
List of Figures
Figure 1 : Different Forms of Representation in Genetic ProgrammingFigure 2 : Basic Procedure of Genetic ProgrammingFigure 3 : Crossover in Genetic ProgrammingFigure 4 : Mutation in Genetic ProgrammingFigure 5 : Rapid Convergence of GSR for Bernoulli Equation ExampleFigure 6 : Schematic Diagram of Updating ProcedureFigure 7 : Location of Orgeval CatchmentFigure 8 : Hydrographs of (a)Training and (b)Verification Storm EventsFigure 9 : Comparison of Observed and Simulated Hydrographs for 2
Training Storm Events: (a) S2 and (b) S7Figure 10 : Updating Every α hours and Forecasting up to 6 hours for
Verification Storm Event S12: (a) α = 2 hours; (b) α = 4 hours;
and (c) α = 6 hoursFigure 11 : Updating Every α hours and Forecasting up to 6 hours for
Verification Storm Event S15: (a) α = 2 hours; (b) α = 4 hours;
and (c) α = 6 hoursFigure 12 : Average RMSE of 2 Verification Storm Events: (a) S12 and (b)
S15
26
(i) A simple expression: a + (b * c)
(ii) as Polish notation (prefix): + * b c a
(iii) as reverse Polish notation (postfix): b c a * +
(iv) as Parse tree:
+
/ \
a *
/ \
b c
27
28
START
Generation = 0
Generate initial random population
Termination criteria satisfied ?
NO
Evaluate fitness of each individual population
YESSTOP
Conduct genetic operations based on probability
Select 2 individuals based on fitness
Perform mutation
Select one individual
Perform crossover
Introduce newly generated individuals into new
population pool
Crossover Mutation
Parent 1 Parent 2
Direct algebraic form:. a+(b*c) (b+a)*((a*c)-d)
(a)
Child 1 Child 2
Direct algebraic form:. a+((a*c)-d) (b+a)*b*c
(b)
29
+
*
b
a
c
*
-
*
+
dab
a c
+
a *
b c
*
+
ab
-
* d
a c
32
Rainfall-runoff Simulation Model
(NAM)
Meteorological Data such as Rainfall
Updating Procedure(Genetic Symbolic
Regression)
ObservedRunoff (QOBS)
Output: Simulated
Runoff(QSIM)
Output:Improved Runoff
(QIMP)
The Grand Morin
Ru de
Rog
non
Ru
de B
ourg
ogne
Ru des A
vene
lles
Org
eval
The Grand Morin
Ru de
Rog
non
Ru
de B
ourg
ogne
Ru des A
vene
lles
Org
eval
The Grand Morin
Ru de
Rog
non
Ru
de B
ourg
ogne
Ru des A
vene
lles
Org
eval
The Grand Morin
Ru de
Rog
non
Ru
de B
ourg
ogne
Ru des A
vene
lles
Org
eval
The Grand Morin
Ru de
Rog
non
Ru
de B
ourg
ogne
Ru des A
vene
lles
The Grand Morin
Ru de
Rog
non
Ru
de B
ourg
ogne
Ru des A
vene
lles
Org
eval
33
(a)
0
1
2
3
4
5
6
7
8
20/07/72 05/02/73 24/08/73 12/03/74 28/09/74 16/04/75
date & time
Dis
cha
rge
(m
3/s
)
ObservedS2
S3
Storm S1
S6S4
S5
S8
S7
S9
S10
Threshold for storm events
Time
34
(b)
Validation data
0
5
10
15
20
25
30
35
11/06/78 02/04/79 05/05/79 08/03/79 11/01/79 01/30/80 04/29/80 07/28/80
Date
Dis
ch
arg
e (
m3
/s)
Threshold for storm event
Storm S11
S12S13
S14
S15
S16
Time
35
(a)
(b)
Storm event 2 (Calibration)
3
4
5
6
7
8
12/02/73 12:00 13/02/73 00:00 13/02/73 12:00 14/02/73 00:00 14/02/73 12:00
date & time
Dis
ch
arg
e (
m3
/s)
Observed
NAM
Storm event 7 (Calibration)
3
4
5
6
7
8
20/03/74 00:00 21/03/74 00:00 22/03/74 00:00 23/03/74 00:00
date & time
Dis
ch
arg
e (m
3/s)
Observed
NAM
Time Time
36
(a)
(b)
0
5
10
15
20
01/02/79 12:00 02/02/79 00:00 02/02/79 12:00 03/02/79 00:00 03/02/79 12:00date & time
Dis
char
ge
(m
3/s)
Observed
NAM+GSR
NAM simulation
updating begins
0
5
10
15
20
01/02/79 12:00 02/02/79 00:00 02/02/79 12:00 03/02/79 00:00 03/02/79 12:00date & time
Dis
char
ge
(m
3/s)
Observed
NAM+GSR
NAM simulation
updating begins
Time
Time
37
(c)
0
5
10
15
20
01/02/79 12:00 02/02/79 00:00 02/02/79 12:00 03/02/79 00:00 03/02/79 12:00date & time
Dis
char
ge
(m
3/s)
Observed
NAM+GSR
NAM simulation
updating begins
Time
38
(a)
(b)
0
5
10
15
20
13/07/80 12:00 14/07/80 00:00 14/07/80 12:00 15/07/80 00:00date & time
Dis
char
ge
(m
3/s
)
Observed
NAM+GSR
NAM simulation
updating begins
0
2
4
6
8
10
12
14
16
18
13/07/80 12:00 14/07/80 00:00 14/07/80 12:00 15/07/80 00:00date & time
Dis
cha
rge
(m
3/s
)
Observed
NAM+GSR
NAM simulation
updating begins
Time
Time
39
(c)
0
2
4
6
8
10
12
14
16
18
13/07/80 12:00 14/07/80 00:00 14/07/80 12:00 15/07/80 00:00date & time
Dis
char
ge
(m
3/s)
Observed
NAM+GSR
NAM simulation
updating begins
Time
40