2009wr008898-pip

8/8/2019 2009WR008898-pip

1/61

WATER RESOURCES RESEARCH, VOL. ???, XXXX, DOI:10.1029/,

Tree-based reinforcement learning for optimal w

reservoir operation

A. Castelletti,1,

S. Galelli,1

M. Restelli,1

R. Soncini-Sessa1

8/8/2019 2009WR008898-pip

2/61

X - 2 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION

Abstract. Although being one of the most popular and extensively stud-

ied approaches to design water reservoir operations, Stochastic Dynamic Pro-

gramming is plagued by a dual curse that makes it unsuitable to cope with

large water systems: the computational requirement grows exponentially with

the number of state variables considered (curse of dimensionality) and an

explicit model must be available to describe every system transition and the

associated rewards/costs (curse of modeling). A varieties of simplications and

approximations have been devised in the past, which, in many cases, make

the resulting operating policies inefficient and of scarce relevance in practi-

cal contexts In this paper a reinforcement-learning approach called fitted

8/8/2019 2009WR008898-pip

3/61

CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION

1. Introduction

Despite the great progress made in the last decades, optimal operation of wat

systems still remains a very active research area (see the recent review by Laba

The combination of multiple, conflicting water uses, non-linearities in the mo

objectives, strong uncertainties in the inputs, and high dimensional state spac

problem challenging and intriguing (Castelletti et al. [2008] and references the

Stochastic Dynamic Programming (SDP) is one of the most suitable meth

signing (Pareto) optimal reservoir operating policies (see, e.g., Soncini-Sessa

and references therein). SDP is based on the formulation of the operating po

8/8/2019 2009WR008898-pip

4/61


and Shane [1982]; Read [1989]; Hooper et al. [1991]; Piccardi and Soncini-Se

Vasiliadis and Karamouz [1994]; Castelletti and Soncini-Sessa [2007]).

Despite being studied so extensively in the literature, SDP suffers from a

which, de facto, prevents its practical application to even reasonably complex

tems. (i) The computational complexity grows exponentially with state, de

disturbance dimensions (Bellmans curse of dimensionality [Bellman, 1957]), s

cannot be used with water systems where the number of reservoirs is greater

(2-3) units. (ii) An explicit model of each component of the water system

(curse of modeling [Bertsekas and Tsitsiklis, 1996]) to anticipate the effects

tem transitions Any information included into the SDP framework can onl

8/8/2019 2009WR008898-pip

5/61


Incremental Dynamic Programming [Larson, 1968], Differential Dynamic Pro

[Jacobson and Mayne, 1970], and problem-specific heuristics [Wong and Luenbe

Luenberger, 1971]. However, these methods have been conceived mainly for de

problems and are of scarce interest for the optimal operation of reservoirs netwo

the uncertainty associated with the underlying hydro-meteorological processes

neglected. Alternative approaches can be classified in two main classes (see

et al. [2008] and references therein for further details) depending on the str

adopt to alleviate the dimensionality burden: methods based on the simplifica

water system model and methods based on the restriction of the degrees of

the policy design problem

8/8/2019 2009WR008898-pip

6/61


and Turgeon [1988] and Saad et al. [1992] proposed a method based on Prin

ponent Analysis to reduce the complexity in a five-reservoir hydropower syst

ten to a four-state variable problem, which was then solvable by SDP. Be

mance was obtained on the same system by Saad et al. [1994], who used a disa

technique based on neural networks. A major contribution to hierarchical mu

composition comes from Haimes [Haimes, 1977]. The idea behind such appro

different decomposition levels are separately modeled and analyzed, but some i

is transmitted from lower to higher levels in the decomposition hierarchy.

The second class of approaches to avert the curse of dimensionality is ba

introduction of some hypotheses on the regularity of the SDP optimal value func

8/8/2019 2009WR008898-pip

7/61


is the central idea of Reinforcement Learning (RL), a well-known framework

tial decision-making (e.g., Barto and Sutton [1998]) that combines concepts

stochastic approximation via simulation, and function approximation. The l

perience can be acquired on-line, by directly experimenting decisions on the r

without any model, or generated off-line, either by using an external simulator

cal observations. While the first option is clearly impracticable on real reservoi

off-line learning has been already experimented in the operation of water syste

letti et al. [2001] (see also Soncini-Sessa et al. [2007]) proposed a partially

version of classical Q-learning [Watkins and Dayan, 1992] to design the daily o

a multi-purpose regulated lake The storage dynamics was simulated via the m

8/8/2019 2009WR008898-pip

8/61


Lately, a new approach, called fitted Q-iteration, which combines RL concept

learning and functional approximation of the value function, has been propo

et al., 2005]. Unlike traditional stochastic approximation algorithms [Bell

1963; Bertsekas and Tsitsiklis, 1996; Tsitsiklis and Roy, 1996], which use

function approximators and thus require a time consuming parameter estimat

at each iteration step, fitted Q-iteration uses tree-based approximation [Brei

1984]. The use of tree-based regressors offers a twofold advantage: first, a grea

flexibility, which is a paramount characteristic in the typical multi-objective

water reservoir systems with multi-dimensional states, where the value func

approximated are unpredictable in shape; second a higher computational effic

8/8/2019 2009WR008898-pip

9/61


In this paper, the fitted Q-iteration is demonstrated on Lake Como, a mu

regulated lake in Italy. As originally proposed in Ernst et al. [2005], fitted

yields a stationary policy, which is perfectly suited for the artificial systems the

has been conceived for, while it is less conforming to natural resources systems

in this paper. An improved version is therefore proposed that includes non

policies, which are more effective in adapting to the natural seasonal varia

focus of the paper is first on studying the properties of the algorithm, with a

of the results sensitivity to the tree-based method parameters. The potential

of the approach are then explored and evaluated against traditional SDP, w

natural term of comparison

8/8/2019 2009WR008898-pip

10/61


denotes the time instant at which such variable assumes a deterministic valu

lake storage is measured at time t an thus is denoted with st, while the disturb

interval [t, t + 1) is denoted with t+1 since it can be deterministically known

end of the interval [Piccardi and Soncini-Sessa, 1991].

2.1. Model of the Water System

The reservoir dynamics is governed by the mass conservation equation:

st+1 = st + at+1 rt+1

where at+1 is the net inflow volume in the time interval [t, t + 1), which in

evaporation and other losses; and r 1 is the release over the same period

8/8/2019 2009WR008898-pip

11/61


ance between compactness and accuracy (e.g., Young [2006]), and are generall

over the first in designing optimal reservoir operation. In the most general f

the inflow can be described as

at+1 = At(It, t+1)

where At() is a periodic function with period T. For example, at+1 can be m

cyclostationary, log-normal autoregressive process of order d (i.e., a log-PAR(

at+1 = exp(yt+1t + t)

yt+1 =d

i=1

i,tyti+1 + t+1

where t and t are the periodic mean and standard deviation of the process

8/8/2019 2009WR008898-pip

12/61


where P can be equal, smaller, or greater than N and nx = N + L P. The d

vector t+1 St+1 Rn is composed ofP disturbances lt+1 (i.e., n = P) with

pdf lt(). Finally, the release decision vector ut Ut(st) Sut Rnu , whose c

are the release decision ujt from each reservoir j (with j = 1, . . . , N and nu = N

the scalar decision ut in equation (5).

The presence of multiple, say q, operating objectives, corresponding to di

ter users and other social and environmental interests, can be formalized by

periodic, with period T, step reward function gt+1 = gt(xt, ut,t+1) associa

stochastic state transition from xt to xt+1. According to the multi-objective na

problem this function can be obtained as a weighted sum (Weighting Metho

8/8/2019 2009WR008898-pip

13/61


can be critical in most of cases. Conversely, when an infinite time horizon is

discount factor must be fixed to ensure convergence of the policy design algori

Discount Cost (TDC) formulation).

For a given value of the weights i, with i = 1, . . . , q, the total reward function

with the operating policy p over infinite time horizon can be defined as

J(p) = limh E1,...,hh1

t=0

t

gt(xt, ut, t+1)

where 0 < < 1 and the expected value is used as criterion to filtering the

disturbances (see Orlovski et al. [1984]; Nardini et al. [1992]; Soncini-Sessa e

for details and alternative solutions). The optimal policy p is obtained by

8/8/2019 2009WR008898-pip

14/61


xt+1 = ft (xt, ut, t+1) t = 0, . . . h 1

mt(xt) = ut Ut (xt) t = 0, . . . h 1

t+1 t (|xt, ut) t = 0, . . . h 1

x0 given

ph {mt () ; t = 0, . . . h 1}

By reformulating and solving the problem for some different values of i

1, . . . , q), a finite subset of the generally infinite Pareto optimal policy set is

Since the system (equations (11b-d)) and the total reward function (11a) a

8/8/2019 2009WR008898-pip

15/61


2. The disturbance vector is known (equation (11d)) and either the distur

independent in time or any dependency upon the past at time t can be accoun

value of the state at the same time.

3. The step reward functions are known and separable, i.e., gt() only d

variables defined for the time interval [t, t + 1).

The solution to problems (9) and (11) is computed by recursively solving th

Bellman equation formulated according to the TDC framework:

Qt(xt, ut) = Et+1

[gt(xt, ut,t+1) + max

ut+1

Qt+1(xt+1, ut+1)

](xt, ut) Sxt

where Qt(, ) is the so-called Q-function or value function, i.e., the cumulativ

8/8/2019 2009WR008898-pip

16/61


To determine the right hand side of equation (12), the domains Sxt, Sut , an

state, release decision, and disturbance must be discretized and, at each itera

the resolution process, explored exhaustively. The choice of the domain discr

essential as it reflects on the algorithm complexity, which is combinatorial in t

of states, release decisions, and disturbances, and in their domain discretizatio

Nut, and Nt+1 be the number of elements in the discretized state, release de

disturbance sets Sxt Rnx, Sut R

nu , and St+1 Rn: the recursive resolut

for kT iteration steps (where k is usually lower than ten) requires

kT Nnxxt Nnuut Nnt

8/8/2019 2009WR008898-pip

17/61


ing from experience with the concept of continuous approximation of the valu

developed for large-scale dynamic programming (see for example Gordon [1995

and Roy [1996]). This results into an improved reduction of the computation

Indeed, a continuous mapping of state-decision pair into the value function sho

the same level of accuracy as a look-up table representation based on an extre

grid, but using a definitely coarser grid for the state-decision space. Further, t

process is performed off-line, without the need for directly experimenting on t

tem, which is a fundamental requirement when dealing with water resources

experiments would led to unsustainable costs in terms both of time, social an

loss

8/8/2019 2009WR008898-pip

18/61


from a finite set of transition samples, the policy generated by fitted Q-itera

an approximation of the optimal policy p that solves problem (11). Precisely

Q-iteration yields an approximation of the optimal Q-functions of the TDC p

by iteratively extending the optimization horizon h, i.e. by iteratively solvin

(11).

The deterministic and stationary (T = 1) case is useful to describe the

Under these simplifying assumptions the state transition (11b) and associa

depend only on the state xt and decision ut. It can be shown [Ernst, 199

following sequence of Qh-functions, defined for all (xt, ut) Sx Su

8/8/2019 2009WR008898-pip

19/61

CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPER

In the stochastic case, the right hand side of equation (16b) is a realization

variable and Qh(xt, ut) is redefined as its expectation. However, the expectat

be operationally applied when Qh() is approximated with a regression func

the least squares method, because this latter generates an approximation of th

expectation of the output variables given the input. Its application to th

constructed considering stochastic transitions provides a continuous app

Qh() over the whole state-decision set.

As originally proposed by Ernst et al. [2005], fitted Q-iteration generates

i.e., just one operating rule of the form ut = m(xt), which is the optimal po

system However natural systems are not stationary and thus a periodic poli

8/8/2019 2009WR008898-pip

20/61

X - 20 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERAT

Set Q0() = 0 over the whole state-decision space Sx Su.

Iterations: repeat until stopping conditions are met

Set h = h + 1.

Build the training set T S= {< il, ol >, l = 1, . . . , #F}

where il = ((t, xt)l, ult) and o

l = glt+1 + maxut+1

Qh1((t + 1, xt+1)l, ut+1)

Run the regression algorithm on T S to get Qh(), from which the policy p

Fitted Q-iteration is said to be a batch RL algorithm, because the whole

processed in a batch mode, in contrast to traditional RL algorithms that per

d t f th l f ti i th f t l ti ll It ti

8/8/2019 2009WR008898-pip

21/61


improve this near-historical policy is to enlarge release decisions explorat

different values around the historical one (see Gaskett [2002]), for each pa

(Figure 1a). This is, however, a risky approach: if the state-decision set has b

during the historical operation (typically, in poorly controlled systems [Tsits

the informative content of the learning data-set can be low and the result

is very likely to be quite far from optimality. Further, the approach is imp

water system has never been operated before (e.g., in planning problems).

An alternative approach is to explore the behavior of the water system, v

for different state values and under different operating policies, namely to a

approach However the modeling effort does not need to involve the who

8/8/2019 2009WR008898-pip

22/61


it would not take any advantage of the continuous approximation of the Q-f

by fitted Q-iteration. Rather, a coarse grid can exponentially reduce the com

by linearly reducing Nxt and Nut in equation (15). Such a coarse grid can b

as a uniform sub-sampling of the SDP dense grid (Figure 1c) or generated w

discretization methods (Figure 1d), such as orthogonal arrays, Latin hypercube

discrepancy sequences (see Cervellera et al. [2006] and reference therein). What

adopted to build the learning data-set, this might contain redundancies, whic

computational requirements with no advantages in terms of policy performan

the data-set is to adopt active learning techniques [Cohn et al., 1996], based

samples that mostly improve the performance of the learning algorithm (see

8/8/2019 2009WR008898-pip

23/61


computational requirements. Further, no human tuning of the function appro

must be ensured (fully automated approximation).

Some parametric function approximators can provide a great modeling

neural networks, for instance, are provably able to approximate any cont

function to any desired degree of accuracy. This modeling flexibility, howev

since it is often reflected in a large number of parameters requiring expli

strongly affecting the computational efficiency (see Castelletti et al. [2005]

risk of over-parameterization. As the problem size scales up, neural networ

more neurons, thus increasing the computational cost of the training pha

function approximators particularly tree-based methods ensure modeling fl

8/8/2019 2009WR008898-pip

24/61


Tree-based methods include KD-Tree, Classification and Regression Tree

1984], Tree Bagging [Breiman, 1996], Totally Randomized Trees and Extre

Trees (Extra-Trees) [Geurts et al., 2006]. These methods basically differ by

the termination test they adopt, and the number of trees they grow. Extr

later) were demonstrated to perform better than other tree-based methods c

fitted Q-iteration algorithm [Ernst et al., 2005] and are therefore adopted in

ticularly, they provide great scalability by adapting the trees structure to t

each iteration, thus resulting in a better accuracy of the final policy. The dra

continuous changes in the structure is that Extra-Trees do not ensure conver

iteration and so the algorithm cannot simply be stopped based on the dista

8/8/2019 2009WR008898-pip

25/61


Three parameters are thus associated to Extra-Trees, whose values can

of empirical evaluations:

K, the number of alternative cut-directions, can be chosen in the inter

n is the number of regressor inputs. When K is equal to n, the choice of

not randomized and the randomization acts only through the choice of th

contrary, low values of K increase the randomization of the trees and wea

of their structure on the output of the training data-set. Geurts et al. [20

demonstrated that, for regression problems, the optimal default value for K

nmin, the minimum cardinality for splitting a node. Large values of nmi

(few leaves) with high bias and small variance Conversely low values o

8/8/2019 2009WR008898-pip

26/61


algorithm to Extra-Trees parameters and the stopping conditions. Second, th

system makes the control problem solvable with SDP, and this is key to perfo

evaluation of the algorithm. Based on this analysis, the advantages of the fitte

SDP can be easily extrapolated to more complex cases, where SDP requiremen

prohibitive for a comparison.

4.1. Description

Lake Como is the third biggest regulated lake in Italy with a surface are

an active storage of 260 Mm3. The 4500 Km2 lakes catchment area produces

inflow of 4.73 Gm3/year with the typical two-peak (spring and autumn) suba

8/8/2019 2009WR008898-pip

27/61


release decision ut is the volume to be released in the next 24 hours from

finally, according to (6), the step reward function gt() is a linear combinat

(negative rewards) accounting for flood damage and downstream water de

data-set F of four-tuples < (t, xt), ut, (t + 1, xt+1), gt+1 > required by th

algorithm was built adopting a partial model-free approach. A 80-points

was used for the state-decision space (Figure 6); precisely, 10 points for

points for the release decision ut, the first six of which correspond to downst

values, plus two greater values. For the inflow at at time t, which plays the

to the system, 15-years daily streamflow data (1965-1979) were directly us

state transitions were performed by running a one-step simulation of equatio

8/8/2019 2009WR008898-pip

28/61


where wt is the aggregated agricultural and hydropower demand and rt+1 is

from the lake given by equation (2).

The policy is designed by solving an equivalent to problem (11) where, accor

of the objectives, the operator max is substituted for min and the aggregated

gt() in equation (8) is computed as

gt() = gft () + (1 )g

w

t

()

with 0 1.

5. Analysis of Fitted Q-iteration Properties

8/8/2019 2009WR008898-pip

29/61


far from optimality depending on the accuracy of the approximation. How

randomization, as h increases the value of Jh fluctuates. The recursive nat

filters the pure random fluctuation (high frequency) of the approximation an

smooth fluctuations. For small h, fluctuations are dominated by the perfor

due to the policy learning process and are not evident. When increasing

useful information for improving the policy, oscillations become the domina

To choose the number h of iterations at which to stop the algorithm, it is

to resort to some empirical criterion. In principle, h should be the value ofh

learning process is nearly over and the improvement in performance is so l

by random fluctuations due to the Extra-Trees approximation As far as t

8/8/2019 2009WR008898-pip

30/61


beginning of the new irrigation season is completely independent from the op

in the previous season and, thus, for what concerns the irrigation compone

constant value. Similar reasoning applies to floods, as they occur at the en

the lake can be emptied in 15 days. Therefore, to design a receding horizon p

iteration algorithm de facto does, 5 months (about 150 days) are enough, and

in Figure 3 is close to this value. This observation not only supports the e

proposed above, but also suggests another stopping criterion, to some extent

does not require to compute Jh after each algorithm iteration): whenever th

re-framed as a receding h-steps horizon problem, h is the natural stopping limi

since the policy learning process does not improve anymore when the numb

8/8/2019 2009WR008898-pip

31/61


The value nmin, the minimum sample size for splitting a node, determines

in a tree and thus the ensembles trade-off between bias and variance. By

in Figure 4 (top panel) the experiment in Figure 3 is replicated for nmin

Reducing nmin decreases the bias (in the average the performance is nearer to

negatively affects the variance (higher amplitude of the fluctuations). As ant

dealing with stochastic function approximation, the regression algorithm

conditional expectation of the output given the input. nmin should therefo

to the number of disturbance realizations available for each state-decision pa

the learning data-set F of Lake Como water system was generated using a 15

the best performance is expected to be obtained for nmin 15 This is co

8/8/2019 2009WR008898-pip

32/61


The potential of the fitted Q-iteration was analyzed via comparison with a

formulation. The learning data-set F of fitted Q-iteration was generated usin

free approach (see Section 4.2). As for SDP, according to the requirement of ex

the system components, the inflow at+1 was described as a cyclostationary (wi

log-normal, stochastic process, whose pdf is defined by the parameters t a

PAR(0) model was assumed (for more details see Pianosi and Soncini-Sessa

at+1 = etmodTt+1+tmodT t+1 N(0, 1)

where t+1 is a Gaussian white noise. The state-decision domain was discret

grid of 27,048 points (Nst = 161; Nut = 168, see Figure 6), while a 9-points gri

8/8/2019 2009WR008898-pip

33/61


grid, is more accurate than the SDP look-up table, even if this is based on

state-decision discretization grid.

By way of demonstration, the policy associated with point A in Figure

fitted Q-iteration, dominates the corresponding policy A, obtained with

floodings per year and nearly 3.5 106 m3 of deficit per year. Both the polic

right the water demand (front flat area in panels (a) and (c) in Figure 8) ov

relatively wide range of storage values and strongly increase the release rate

seasons. In so doing they create a time-varying flood buffer zone, whose dim

designed as it is either learnt from the flood events and the associated eff

data-set (fitted Q-iteration) or implicitly inferred from the stochastic inflow

8/8/2019 2009WR008898-pip

34/61


does. An example is provided in Figure 11. The same example shows that

significantly outperform the historical operation, which appears to be much m

By moving toward the left extreme of the Pareto front (point B and B in

increasing the relative importance of irrigation over floods, SDP performs bet

iteration. This is basically due to the approximation error in the tree-based in

Q-functions. Indeed, as the importance of the irrigation increases, the confl

becomes negligible and the optimal policy simply suggests to release the w

anticipated, water demand values belong to the release decision discretization

algorithms. However, while the release decision chosen by SDP is necessarily

thus a water demand value fitted Q-iteration uses a continuous approximation

8/8/2019 2009WR008898-pip

35/61


6.1. Computational Requirements

A comparative analysis of the computational requirements by fitted Q-ite

be empirically performed by inferring some general rules from the computin

the Lake Como case study.

As anticipated, the time tSDP required to design an operating policy with

to the number of evaluations of the operator E[], which is given by equati

the state dimensionality (nx) in two ns storages and nI hydro-meteorolog

time tSDP can be expressed as

tSDP = a kT (

Nnsst NnIIt Nnuut N

nt

)

8/8/2019 2009WR008898-pip

36/61


disturbances. Assuming that the fitted Q-iteration coarse grid is obtained by r

grid of an equally performing SDP of a factor rs and ru respectively for state a

tQ1 = b T

(

Nstrs

)ns (Nutru

)nu Na

where b is a constant, machine-dependent parameter, and Na is the numb

realizations (i.e., the number of years in the historical data set used for the infl

hydro-meteorological information). Time tQ2 grows linearly in the time horizon

of regressors k (i.e., ns + nI + nu + 1) and trees M, and superlinearly in the

(Nst

rs)ns (

Nutru

)nu Na T) of four-tuples in the data-set. Precisely

tQ2 = c (Nst )ns (

Nut )nu N T log((Nst )ns (

Nut )nu N T ) M (n +n

8/8/2019 2009WR008898-pip

37/61


lines & white circles) is evident from both the panels. Fitted Q-iteration g

computational limits of SDP (i.e., ns 2) on complex networks includin

(top panel). The improvement is even more remarkable when the operating

exogenous information (bottom panel), as the model-free (uncontrolled!) c

nearly no computational additional time for fitted Q-iteration: while SDP r

days for a configuration with 1 reservoir and 2 exogenous information, fitted

hours.

7. Conclusions

One major technical challenge in expanding the scope of water resources

8/8/2019 2009WR008898-pip

38/61


The application to Lake Como water system was used to infer general gui

propriate setting for the algorithm parameters, to define an empirical stoppi

to demonstrate the potential of the approach compared to traditional SDP. T

with fitted Q-iteration on an extremely coarse state-decision discretization g

generally outperform an equivalent SDP-derived policy computed on a very

dominance is particularly remarkable on flood events (Figure 9), when the tim

of both the policies, which is key to anticipate and buffer floods when no inflo

considered, is more effectively exploited by the fitted Q-iteration. Based on

Lake Como, a general rule was also derived to quantify the computational

fitted Q-iteration over SDP in designing daily operating policies for large wa

8/8/2019 2009WR008898-pip

39/61


2007] combined with policy reconstruction procedure [Schneega et al., 2007

operating policy is first identified and then iteratively improved by the algo

While the Extra-Trees used by the fitted Q-iteration have been shown t

racy/efficiency trade-off, this comes at the price of a lack of a well-defined and

condition, which, in turn, might negatively affect both the accuracy (the p

the best one explored) and the efficiency (a better policy could have been fo

algorithm earlier). Strictly, this happens because of the tree structures refres

of the fitted Q-algorithm, which is key to build an accurate approximation

of the algorithms, but prevents the approximated Q-functions from stabili

improvement in the policy performance is marginal Further investigations

8/8/2019 2009WR008898-pip

40/61


control approaches. These also include process-based modeling of rainfall-run

generate climate change scenarios and investigate adaptive management strate

the combined use of fitted Q-iteration and model reduction techniques [Cast

is worth be explored for these purposes.

Acknowledgments. The work was completed while Andrea Castelletti, S

Rodolfo Soncini-Sessa were on leave at the Centre for Water Research, Univ

Australia. This paper forms CWR reference 2329 AC.

References

A ( ) A

8/8/2019 2009WR008898-pip

41/61


Bhattacharya, A., A. Lobbrecht, and D. Solomatine (2003), Neural network

learning in control of water systems, Journal of Water Resources Plannin

ASCE, 129(6), 458465.

Bonarini, A., A. Lazaric, and M. Restelli (2007), Piecewise constant reinfo

robotic applications, in Proceedings of the 4th International Conferenc

Control, Automation and Robotics (ICINCO 2007).

Breiman, L. (1996), Bagging predictors, Machine Learning, 24 (2), 123140

Breiman, L. (2001), Random forests, Machine Learning, 45(1), 532.

Breiman, L., J. Friedman, R. Olsen, and C. Stone (1984), Classification a

Wadsworth & Brooks Pacific Grove CA

8/8/2019 2009WR008898-pip

42/61


Castelletti, A., M. De Zaiacomo, S. Galelli, M. Restelli, P. Sanavia, R. S

J. Antenucci (2009), An emulation modelling approach to reduce the co

hydrodynamic-ecological model of a reservoir, in Proceedings of Internatio

Environmental Software Systems (ISESS2009), October 2-9, Venice, I.

Cervellera, C., V. Chen, and A. Wen (2006), Optimization of a large-scale water

by stochastic dynamic programming with efficient state space discretization,

of Operational Research, 171 (3), 11391151.

Cohn, D., Z. Ghahramani, and M. Jordan (1996), Active learning with statistic

of artificial intelligence research, 4, 129145.

Ernst D (1999) Near optimal closed-loop control application to electric pow

8/8/2019 2009WR008898-pip

43/61


Galelli, S., and R. Soncini-Sessa (2010), Combining metamodelling and stoc

gramming for the design of reservoirs release policies, Environmental M

25(2), 209222.

Galelli, S., C. Gandolfi, R. Soncini-Sessa, and D. Agostani (2010), Build

an irrigation district distributed-parameter model, Agricultural Water

187200.

Gaskett, C. (2002), Q-learning for robot control, Ph.D. thesis, Australian N

Canberra, AUS.

Geurts, P., D. Ernst, and L. Wehenkel (2006), Extremely randomized trees

63 (1) 342

8/8/2019 2009WR008898-pip

44/61


Haimes, Y. (1977), Hierarchical Analyses of Water Resources Systems, McGra

NY.

Hall, W., and N. Buras (1961), The dynamic programming approach to wate

opment, Journal of Geophysical Research, 66(2), 510520.

Hall, W., W. Butcher, and A. Esogbue (1968), Optimization of the operation o

reservoir by dynamic programming, Water Resources Research, 4 (3), 4714

Heidari, M., V. Chow, P. Kokotovic, and D. Meredith (1971), Discrete di

programming approach to water resources systems optimisation, Water Re

7(2), 273282.

Hejazi M X Cai and B Ruddell (2008) The role of hydrologic information

8/8/2019 2009WR008898-pip

45/61


Kelman, J., J. Stedinger, L. Cooper, E. Hsu, and S. Yuan (1990), Sampling

Programming applied to reservoir operation, Water Resources Research,

Labadie, J. (2004), Optimal operation of multireservoir systems: State-of-the

of Water Research Planning and Management - ASCE, 130(2), 93111.

Larson, R. (1968), State Incremental Dynamic Programming, American Else

Lee, J.-H., and J. W. Labadie (2007), Stochastic optimization of multir

reinforcement learning, Water Resources Research, 43(11), 116.

Luenberger, D. (1971), Cyclic dynamic programming: a procedure for proble

Operations Research, 19(4), 11011110.

Nardini A C Piccardi and R Soncini-Sessa (1992) On the integration

8/8/2019 2009WR008898-pip

46/61


Piccardi, C., and R. Soncini-Sessa (1991), Stochastic dynamic programming

mal control: dense discretization and inflow correlation assumption made p

computing, Water Resources Research, 27(5), 729741.

Read, E. (1989), Dynamic Programming for Optimal Water Resources Systems

dual approach to stochastic dynamic programming for reservoir release schedu

Prentice-Hall, Englewood Cliffs.

Saad, M., and A. Turgeon (1988), Application of principal component ana

reservoir management, Water Resources Research, 24 (7), 907912.

Saad, M., A. Turgeon, , and J. Stedinger (1992), Censored-data correlation a

ponent dynamic programming Water Resources Research 28 (8) 2135214

8/8/2019 2009WR008898-pip

47/61

!! Please write \lefthead{} in file !!: !! Please write \righthead{

8/8/2019 2009WR008898-pip

48/61

X - 48!! Please write \lefthead{} in file !!: !! Please write \righthead{

8/8/2019 2009WR008898-pip

49/61

!! Please write \lefthead{} in file !!: !! Please write \righthead{

8/8/2019 2009WR008898-pip

50/61

8/8/2019 2009WR008898-pip

51/61

8/8/2019 2009WR008898-pip

52/61

8/8/2019 2009WR008898-pip

53/61

8/8/2019 2009WR008898-pip

54/61

8/8/2019 2009WR008898-pip

55/61

8/8/2019 2009WR008898-pip

56/61

8/8/2019 2009WR008898-pip

57/61

8/8/2019 2009WR008898-pip

58/61

8/8/2019 2009WR008898-pip

59/61

8/8/2019 2009WR008898-pip

60/61

8/8/2019 2009WR008898-pip

61/61

2009wr008898-pip

Documents