2009wr008898-pip
TRANSCRIPT
-
8/8/2019 2009WR008898-pip
1/61
WATER RESOURCES RESEARCH, VOL. ???, XXXX, DOI:10.1029/,
Tree-based reinforcement learning for optimal w
reservoir operation
A. Castelletti,1,
S. Galelli,1
M. Restelli,1
R. Soncini-Sessa1
-
8/8/2019 2009WR008898-pip
2/61
X - 2 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
Abstract. Although being one of the most popular and extensively stud-
ied approaches to design water reservoir operations, Stochastic Dynamic Pro-
gramming is plagued by a dual curse that makes it unsuitable to cope with
large water systems: the computational requirement grows exponentially with
the number of state variables considered (curse of dimensionality) and an
explicit model must be available to describe every system transition and the
associated rewards/costs (curse of modeling). A varieties of simplications and
approximations have been devised in the past, which, in many cases, make
the resulting operating policies inefficient and of scarce relevance in practi-
cal contexts In this paper a reinforcement-learning approach called fitted
-
8/8/2019 2009WR008898-pip
3/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
1. Introduction
Despite the great progress made in the last decades, optimal operation of wat
systems still remains a very active research area (see the recent review by Laba
The combination of multiple, conflicting water uses, non-linearities in the mo
objectives, strong uncertainties in the inputs, and high dimensional state spac
problem challenging and intriguing (Castelletti et al. [2008] and references the
Stochastic Dynamic Programming (SDP) is one of the most suitable meth
signing (Pareto) optimal reservoir operating policies (see, e.g., Soncini-Sessa
and references therein). SDP is based on the formulation of the operating po
-
8/8/2019 2009WR008898-pip
4/61
X - 4 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
and Shane [1982]; Read [1989]; Hooper et al. [1991]; Piccardi and Soncini-Se
Vasiliadis and Karamouz [1994]; Castelletti and Soncini-Sessa [2007]).
Despite being studied so extensively in the literature, SDP suffers from a
which, de facto, prevents its practical application to even reasonably complex
tems. (i) The computational complexity grows exponentially with state, de
disturbance dimensions (Bellmans curse of dimensionality [Bellman, 1957]), s
cannot be used with water systems where the number of reservoirs is greater
(2-3) units. (ii) An explicit model of each component of the water system
(curse of modeling [Bertsekas and Tsitsiklis, 1996]) to anticipate the effects
tem transitions Any information included into the SDP framework can onl
-
8/8/2019 2009WR008898-pip
5/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
Incremental Dynamic Programming [Larson, 1968], Differential Dynamic Pro
[Jacobson and Mayne, 1970], and problem-specific heuristics [Wong and Luenbe
Luenberger, 1971]. However, these methods have been conceived mainly for de
problems and are of scarce interest for the optimal operation of reservoirs netwo
the uncertainty associated with the underlying hydro-meteorological processes
neglected. Alternative approaches can be classified in two main classes (see
et al. [2008] and references therein for further details) depending on the str
adopt to alleviate the dimensionality burden: methods based on the simplifica
water system model and methods based on the restriction of the degrees of
the policy design problem
-
8/8/2019 2009WR008898-pip
6/61
X - 6 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
and Turgeon [1988] and Saad et al. [1992] proposed a method based on Prin
ponent Analysis to reduce the complexity in a five-reservoir hydropower syst
ten to a four-state variable problem, which was then solvable by SDP. Be
mance was obtained on the same system by Saad et al. [1994], who used a disa
technique based on neural networks. A major contribution to hierarchical mu
composition comes from Haimes [Haimes, 1977]. The idea behind such appro
different decomposition levels are separately modeled and analyzed, but some i
is transmitted from lower to higher levels in the decomposition hierarchy.
The second class of approaches to avert the curse of dimensionality is ba
introduction of some hypotheses on the regularity of the SDP optimal value func
-
8/8/2019 2009WR008898-pip
7/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
is the central idea of Reinforcement Learning (RL), a well-known framework
tial decision-making (e.g., Barto and Sutton [1998]) that combines concepts
stochastic approximation via simulation, and function approximation. The l
perience can be acquired on-line, by directly experimenting decisions on the r
without any model, or generated off-line, either by using an external simulator
cal observations. While the first option is clearly impracticable on real reservoi
off-line learning has been already experimented in the operation of water syste
letti et al. [2001] (see also Soncini-Sessa et al. [2007]) proposed a partially
version of classical Q-learning [Watkins and Dayan, 1992] to design the daily o
a multi-purpose regulated lake The storage dynamics was simulated via the m
-
8/8/2019 2009WR008898-pip
8/61
X - 8 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
Lately, a new approach, called fitted Q-iteration, which combines RL concept
learning and functional approximation of the value function, has been propo
et al., 2005]. Unlike traditional stochastic approximation algorithms [Bell
1963; Bertsekas and Tsitsiklis, 1996; Tsitsiklis and Roy, 1996], which use
function approximators and thus require a time consuming parameter estimat
at each iteration step, fitted Q-iteration uses tree-based approximation [Brei
1984]. The use of tree-based regressors offers a twofold advantage: first, a grea
flexibility, which is a paramount characteristic in the typical multi-objective
water reservoir systems with multi-dimensional states, where the value func
approximated are unpredictable in shape; second a higher computational effic
-
8/8/2019 2009WR008898-pip
9/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
In this paper, the fitted Q-iteration is demonstrated on Lake Como, a mu
regulated lake in Italy. As originally proposed in Ernst et al. [2005], fitted
yields a stationary policy, which is perfectly suited for the artificial systems the
has been conceived for, while it is less conforming to natural resources systems
in this paper. An improved version is therefore proposed that includes non
policies, which are more effective in adapting to the natural seasonal varia
focus of the paper is first on studying the properties of the algorithm, with a
of the results sensitivity to the tree-based method parameters. The potential
of the approach are then explored and evaluated against traditional SDP, w
natural term of comparison
-
8/8/2019 2009WR008898-pip
10/61
X - 10 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
denotes the time instant at which such variable assumes a deterministic valu
lake storage is measured at time t an thus is denoted with st, while the disturb
interval [t, t + 1) is denoted with t+1 since it can be deterministically known
end of the interval [Piccardi and Soncini-Sessa, 1991].
2.1. Model of the Water System
The reservoir dynamics is governed by the mass conservation equation:
st+1 = st + at+1 rt+1
where at+1 is the net inflow volume in the time interval [t, t + 1), which in
evaporation and other losses; and r 1 is the release over the same period
-
8/8/2019 2009WR008898-pip
11/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
ance between compactness and accuracy (e.g., Young [2006]), and are generall
over the first in designing optimal reservoir operation. In the most general f
the inflow can be described as
at+1 = At(It, t+1)
where At() is a periodic function with period T. For example, at+1 can be m
cyclostationary, log-normal autoregressive process of order d (i.e., a log-PAR(
at+1 = exp(yt+1t + t)
yt+1 =d
i=1
i,tyti+1 + t+1
where t and t are the periodic mean and standard deviation of the process
-
8/8/2019 2009WR008898-pip
12/61
X - 12 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
where P can be equal, smaller, or greater than N and nx = N + L P. The d
vector t+1 St+1 Rn is composed ofP disturbances lt+1 (i.e., n = P) with
pdf lt(). Finally, the release decision vector ut Ut(st) Sut Rnu , whose c
are the release decision ujt from each reservoir j (with j = 1, . . . , N and nu = N
the scalar decision ut in equation (5).
The presence of multiple, say q, operating objectives, corresponding to di
ter users and other social and environmental interests, can be formalized by
periodic, with period T, step reward function gt+1 = gt(xt, ut,t+1) associa
stochastic state transition from xt to xt+1. According to the multi-objective na
problem this function can be obtained as a weighted sum (Weighting Metho
-
8/8/2019 2009WR008898-pip
13/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
can be critical in most of cases. Conversely, when an infinite time horizon is
discount factor must be fixed to ensure convergence of the policy design algori
Discount Cost (TDC) formulation).
For a given value of the weights i, with i = 1, . . . , q, the total reward function
with the operating policy p over infinite time horizon can be defined as
J(p) = limh E1,...,hh1
t=0
t
gt(xt, ut, t+1)
where 0 < < 1 and the expected value is used as criterion to filtering the
disturbances (see Orlovski et al. [1984]; Nardini et al. [1992]; Soncini-Sessa e
for details and alternative solutions). The optimal policy p is obtained by
-
8/8/2019 2009WR008898-pip
14/61
X - 14 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
xt+1 = ft (xt, ut, t+1) t = 0, . . . h 1
mt(xt) = ut Ut (xt) t = 0, . . . h 1
t+1 t (|xt, ut) t = 0, . . . h 1
x0 given
ph {mt () ; t = 0, . . . h 1}
By reformulating and solving the problem for some different values of i
1, . . . , q), a finite subset of the generally infinite Pareto optimal policy set is
Since the system (equations (11b-d)) and the total reward function (11a) a
-
8/8/2019 2009WR008898-pip
15/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
2. The disturbance vector is known (equation (11d)) and either the distur
independent in time or any dependency upon the past at time t can be accoun
value of the state at the same time.
3. The step reward functions are known and separable, i.e., gt() only d
variables defined for the time interval [t, t + 1).
The solution to problems (9) and (11) is computed by recursively solving th
Bellman equation formulated according to the TDC framework:
Qt(xt, ut) = Et+1
[gt(xt, ut,t+1) + max
ut+1
Qt+1(xt+1, ut+1)
](xt, ut) Sxt
where Qt(, ) is the so-called Q-function or value function, i.e., the cumulativ
-
8/8/2019 2009WR008898-pip
16/61
X - 16 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
To determine the right hand side of equation (12), the domains Sxt, Sut , an
state, release decision, and disturbance must be discretized and, at each itera
the resolution process, explored exhaustively. The choice of the domain discr
essential as it reflects on the algorithm complexity, which is combinatorial in t
of states, release decisions, and disturbances, and in their domain discretizatio
Nut, and Nt+1 be the number of elements in the discretized state, release de
disturbance sets Sxt Rnx, Sut R
nu , and St+1 Rn: the recursive resolut
for kT iteration steps (where k is usually lower than ten) requires
kT Nnxxt Nnuut Nnt
-
8/8/2019 2009WR008898-pip
17/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
ing from experience with the concept of continuous approximation of the valu
developed for large-scale dynamic programming (see for example Gordon [1995
and Roy [1996]). This results into an improved reduction of the computation
Indeed, a continuous mapping of state-decision pair into the value function sho
the same level of accuracy as a look-up table representation based on an extre
grid, but using a definitely coarser grid for the state-decision space. Further, t
process is performed off-line, without the need for directly experimenting on t
tem, which is a fundamental requirement when dealing with water resources
experiments would led to unsustainable costs in terms both of time, social an
loss
-
8/8/2019 2009WR008898-pip
18/61
X - 18 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERATION
from a finite set of transition samples, the policy generated by fitted Q-itera
an approximation of the optimal policy p that solves problem (11). Precisely
Q-iteration yields an approximation of the optimal Q-functions of the TDC p
by iteratively extending the optimization horizon h, i.e. by iteratively solvin
(11).
The deterministic and stationary (T = 1) case is useful to describe the
Under these simplifying assumptions the state transition (11b) and associa
depend only on the state xt and decision ut. It can be shown [Ernst, 199
following sequence of Qh-functions, defined for all (xt, ut) Sx Su
-
8/8/2019 2009WR008898-pip
19/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPER
In the stochastic case, the right hand side of equation (16b) is a realization
variable and Qh(xt, ut) is redefined as its expectation. However, the expectat
be operationally applied when Qh() is approximated with a regression func
the least squares method, because this latter generates an approximation of th
expectation of the output variables given the input. Its application to th
constructed considering stochastic transitions provides a continuous app
Qh() over the whole state-decision set.
As originally proposed by Ernst et al. [2005], fitted Q-iteration generates
i.e., just one operating rule of the form ut = m(xt), which is the optimal po
system However natural systems are not stationary and thus a periodic poli
-
8/8/2019 2009WR008898-pip
20/61
X - 20 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERAT
Set Q0() = 0 over the whole state-decision space Sx Su.
Iterations: repeat until stopping conditions are met
Set h = h + 1.
Build the training set T S= {< il, ol >, l = 1, . . . , #F}
where il = ((t, xt)l, ult) and o
l = glt+1 + maxut+1
Qh1((t + 1, xt+1)l, ut+1)
Run the regression algorithm on T S to get Qh(), from which the policy p
Fitted Q-iteration is said to be a batch RL algorithm, because the whole
processed in a batch mode, in contrast to traditional RL algorithms that per
d t f th l f ti i th f t l ti ll It ti
-
8/8/2019 2009WR008898-pip
21/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPER
improve this near-historical policy is to enlarge release decisions explorat
different values around the historical one (see Gaskett [2002]), for each pa
(Figure 1a). This is, however, a risky approach: if the state-decision set has b
during the historical operation (typically, in poorly controlled systems [Tsits
the informative content of the learning data-set can be low and the result
is very likely to be quite far from optimality. Further, the approach is imp
water system has never been operated before (e.g., in planning problems).
An alternative approach is to explore the behavior of the water system, v
for different state values and under different operating policies, namely to a
approach However the modeling effort does not need to involve the who
-
8/8/2019 2009WR008898-pip
22/61
X - 22 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERAT
it would not take any advantage of the continuous approximation of the Q-f
by fitted Q-iteration. Rather, a coarse grid can exponentially reduce the com
by linearly reducing Nxt and Nut in equation (15). Such a coarse grid can b
as a uniform sub-sampling of the SDP dense grid (Figure 1c) or generated w
discretization methods (Figure 1d), such as orthogonal arrays, Latin hypercube
discrepancy sequences (see Cervellera et al. [2006] and reference therein). What
adopted to build the learning data-set, this might contain redundancies, whic
computational requirements with no advantages in terms of policy performan
the data-set is to adopt active learning techniques [Cohn et al., 1996], based
samples that mostly improve the performance of the learning algorithm (see
-
8/8/2019 2009WR008898-pip
23/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPER
computational requirements. Further, no human tuning of the function appro
must be ensured (fully automated approximation).
Some parametric function approximators can provide a great modeling
neural networks, for instance, are provably able to approximate any cont
function to any desired degree of accuracy. This modeling flexibility, howev
since it is often reflected in a large number of parameters requiring expli
strongly affecting the computational efficiency (see Castelletti et al. [2005]
risk of over-parameterization. As the problem size scales up, neural networ
more neurons, thus increasing the computational cost of the training pha
function approximators particularly tree-based methods ensure modeling fl
-
8/8/2019 2009WR008898-pip
24/61
X - 24 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERAT
Tree-based methods include KD-Tree, Classification and Regression Tree
1984], Tree Bagging [Breiman, 1996], Totally Randomized Trees and Extre
Trees (Extra-Trees) [Geurts et al., 2006]. These methods basically differ by
the termination test they adopt, and the number of trees they grow. Extr
later) were demonstrated to perform better than other tree-based methods c
fitted Q-iteration algorithm [Ernst et al., 2005] and are therefore adopted in
ticularly, they provide great scalability by adapting the trees structure to t
each iteration, thus resulting in a better accuracy of the final policy. The dra
continuous changes in the structure is that Extra-Trees do not ensure conver
iteration and so the algorithm cannot simply be stopped based on the dista
-
8/8/2019 2009WR008898-pip
25/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPER
Three parameters are thus associated to Extra-Trees, whose values can
of empirical evaluations:
K, the number of alternative cut-directions, can be chosen in the inter
n is the number of regressor inputs. When K is equal to n, the choice of
not randomized and the randomization acts only through the choice of th
contrary, low values of K increase the randomization of the trees and wea
of their structure on the output of the training data-set. Geurts et al. [20
demonstrated that, for regression problems, the optimal default value for K
nmin, the minimum cardinality for splitting a node. Large values of nmi
(few leaves) with high bias and small variance Conversely low values o
-
8/8/2019 2009WR008898-pip
26/61
X - 26 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERAT
algorithm to Extra-Trees parameters and the stopping conditions. Second, th
system makes the control problem solvable with SDP, and this is key to perfo
evaluation of the algorithm. Based on this analysis, the advantages of the fitte
SDP can be easily extrapolated to more complex cases, where SDP requiremen
prohibitive for a comparison.
4.1. Description
Lake Como is the third biggest regulated lake in Italy with a surface are
an active storage of 260 Mm3. The 4500 Km2 lakes catchment area produces
inflow of 4.73 Gm3/year with the typical two-peak (spring and autumn) suba
-
8/8/2019 2009WR008898-pip
27/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPER
release decision ut is the volume to be released in the next 24 hours from
finally, according to (6), the step reward function gt() is a linear combinat
(negative rewards) accounting for flood damage and downstream water de
data-set F of four-tuples < (t, xt), ut, (t + 1, xt+1), gt+1 > required by th
algorithm was built adopting a partial model-free approach. A 80-points
was used for the state-decision space (Figure 6); precisely, 10 points for
points for the release decision ut, the first six of which correspond to downst
values, plus two greater values. For the inflow at at time t, which plays the
to the system, 15-years daily streamflow data (1965-1979) were directly us
state transitions were performed by running a one-step simulation of equatio
-
8/8/2019 2009WR008898-pip
28/61
X - 28 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERAT
where wt is the aggregated agricultural and hydropower demand and rt+1 is
from the lake given by equation (2).
The policy is designed by solving an equivalent to problem (11) where, accor
of the objectives, the operator max is substituted for min and the aggregated
gt() in equation (8) is computed as
gt() = gft () + (1 )g
w
t
()
with 0 1.
5. Analysis of Fitted Q-iteration Properties
-
8/8/2019 2009WR008898-pip
29/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPER
far from optimality depending on the accuracy of the approximation. How
randomization, as h increases the value of Jh fluctuates. The recursive nat
filters the pure random fluctuation (high frequency) of the approximation an
smooth fluctuations. For small h, fluctuations are dominated by the perfor
due to the policy learning process and are not evident. When increasing
useful information for improving the policy, oscillations become the domina
To choose the number h of iterations at which to stop the algorithm, it is
to resort to some empirical criterion. In principle, h should be the value ofh
learning process is nearly over and the improvement in performance is so l
by random fluctuations due to the Extra-Trees approximation As far as t
-
8/8/2019 2009WR008898-pip
30/61
X - 30 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERAT
beginning of the new irrigation season is completely independent from the op
in the previous season and, thus, for what concerns the irrigation compone
constant value. Similar reasoning applies to floods, as they occur at the en
the lake can be emptied in 15 days. Therefore, to design a receding horizon p
iteration algorithm de facto does, 5 months (about 150 days) are enough, and
in Figure 3 is close to this value. This observation not only supports the e
proposed above, but also suggests another stopping criterion, to some extent
does not require to compute Jh after each algorithm iteration): whenever th
re-framed as a receding h-steps horizon problem, h is the natural stopping limi
since the policy learning process does not improve anymore when the numb
-
8/8/2019 2009WR008898-pip
31/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPER
The value nmin, the minimum sample size for splitting a node, determines
in a tree and thus the ensembles trade-off between bias and variance. By
in Figure 4 (top panel) the experiment in Figure 3 is replicated for nmin
Reducing nmin decreases the bias (in the average the performance is nearer to
negatively affects the variance (higher amplitude of the fluctuations). As ant
dealing with stochastic function approximation, the regression algorithm
conditional expectation of the output given the input. nmin should therefo
to the number of disturbance realizations available for each state-decision pa
the learning data-set F of Lake Como water system was generated using a 15
the best performance is expected to be obtained for nmin 15 This is co
-
8/8/2019 2009WR008898-pip
32/61
X - 32 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERAT
The potential of the fitted Q-iteration was analyzed via comparison with a
formulation. The learning data-set F of fitted Q-iteration was generated usin
free approach (see Section 4.2). As for SDP, according to the requirement of ex
the system components, the inflow at+1 was described as a cyclostationary (wi
log-normal, stochastic process, whose pdf is defined by the parameters t a
PAR(0) model was assumed (for more details see Pianosi and Soncini-Sessa
at+1 = etmodTt+1+tmodT t+1 N(0, 1)
where t+1 is a Gaussian white noise. The state-decision domain was discret
grid of 27,048 points (Nst = 161; Nut = 168, see Figure 6), while a 9-points gri
-
8/8/2019 2009WR008898-pip
33/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPER
grid, is more accurate than the SDP look-up table, even if this is based on
state-decision discretization grid.
By way of demonstration, the policy associated with point A in Figure
fitted Q-iteration, dominates the corresponding policy A, obtained with
floodings per year and nearly 3.5 106 m3 of deficit per year. Both the polic
right the water demand (front flat area in panels (a) and (c) in Figure 8) ov
relatively wide range of storage values and strongly increase the release rate
seasons. In so doing they create a time-varying flood buffer zone, whose dim
designed as it is either learnt from the flood events and the associated eff
data-set (fitted Q-iteration) or implicitly inferred from the stochastic inflow
-
8/8/2019 2009WR008898-pip
34/61
X - 34 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERAT
does. An example is provided in Figure 11. The same example shows that
significantly outperform the historical operation, which appears to be much m
By moving toward the left extreme of the Pareto front (point B and B in
increasing the relative importance of irrigation over floods, SDP performs bet
iteration. This is basically due to the approximation error in the tree-based in
Q-functions. Indeed, as the importance of the irrigation increases, the confl
becomes negligible and the optimal policy simply suggests to release the w
anticipated, water demand values belong to the release decision discretization
algorithms. However, while the release decision chosen by SDP is necessarily
thus a water demand value fitted Q-iteration uses a continuous approximation
-
8/8/2019 2009WR008898-pip
35/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPER
6.1. Computational Requirements
A comparative analysis of the computational requirements by fitted Q-ite
be empirically performed by inferring some general rules from the computin
the Lake Como case study.
As anticipated, the time tSDP required to design an operating policy with
to the number of evaluations of the operator E[], which is given by equati
the state dimensionality (nx) in two ns storages and nI hydro-meteorolog
time tSDP can be expressed as
tSDP = a kT (
Nnsst NnIIt Nnuut N
nt
)
-
8/8/2019 2009WR008898-pip
36/61
X - 36 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERAT
disturbances. Assuming that the fitted Q-iteration coarse grid is obtained by r
grid of an equally performing SDP of a factor rs and ru respectively for state a
tQ1 = b T
(
Nstrs
)ns (Nutru
)nu Na
where b is a constant, machine-dependent parameter, and Na is the numb
realizations (i.e., the number of years in the historical data set used for the infl
hydro-meteorological information). Time tQ2 grows linearly in the time horizon
of regressors k (i.e., ns + nI + nu + 1) and trees M, and superlinearly in the
(Nst
rs)ns (
Nutru
)nu Na T) of four-tuples in the data-set. Precisely
tQ2 = c (Nst )ns (
Nut )nu N T log((Nst )ns (
Nut )nu N T ) M (n +n
-
8/8/2019 2009WR008898-pip
37/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPER
lines & white circles) is evident from both the panels. Fitted Q-iteration g
computational limits of SDP (i.e., ns 2) on complex networks includin
(top panel). The improvement is even more remarkable when the operating
exogenous information (bottom panel), as the model-free (uncontrolled!) c
nearly no computational additional time for fitted Q-iteration: while SDP r
days for a configuration with 1 reservoir and 2 exogenous information, fitted
hours.
7. Conclusions
One major technical challenge in expanding the scope of water resources
-
8/8/2019 2009WR008898-pip
38/61
X - 38 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERAT
The application to Lake Como water system was used to infer general gui
propriate setting for the algorithm parameters, to define an empirical stoppi
to demonstrate the potential of the approach compared to traditional SDP. T
with fitted Q-iteration on an extremely coarse state-decision discretization g
generally outperform an equivalent SDP-derived policy computed on a very
dominance is particularly remarkable on flood events (Figure 9), when the tim
of both the policies, which is key to anticipate and buffer floods when no inflo
considered, is more effectively exploited by the fitted Q-iteration. Based on
Lake Como, a general rule was also derived to quantify the computational
fitted Q-iteration over SDP in designing daily operating policies for large wa
-
8/8/2019 2009WR008898-pip
39/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPER
2007] combined with policy reconstruction procedure [Schneega et al., 2007
operating policy is first identified and then iteratively improved by the algo
While the Extra-Trees used by the fitted Q-iteration have been shown t
racy/efficiency trade-off, this comes at the price of a lack of a well-defined and
condition, which, in turn, might negatively affect both the accuracy (the p
the best one explored) and the efficiency (a better policy could have been fo
algorithm earlier). Strictly, this happens because of the tree structures refres
of the fitted Q-algorithm, which is key to build an accurate approximation
of the algorithms, but prevents the approximated Q-functions from stabili
improvement in the policy performance is marginal Further investigations
-
8/8/2019 2009WR008898-pip
40/61
X - 40 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERAT
control approaches. These also include process-based modeling of rainfall-run
generate climate change scenarios and investigate adaptive management strate
the combined use of fitted Q-iteration and model reduction techniques [Cast
is worth be explored for these purposes.
Acknowledgments. The work was completed while Andrea Castelletti, S
Rodolfo Soncini-Sessa were on leave at the Centre for Water Research, Univ
Australia. This paper forms CWR reference 2329 AC.
References
A ( ) A
-
8/8/2019 2009WR008898-pip
41/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPER
Bhattacharya, A., A. Lobbrecht, and D. Solomatine (2003), Neural network
learning in control of water systems, Journal of Water Resources Plannin
ASCE, 129(6), 458465.
Bonarini, A., A. Lazaric, and M. Restelli (2007), Piecewise constant reinfo
robotic applications, in Proceedings of the 4th International Conferenc
Control, Automation and Robotics (ICINCO 2007).
Breiman, L. (1996), Bagging predictors, Machine Learning, 24 (2), 123140
Breiman, L. (2001), Random forests, Machine Learning, 45(1), 532.
Breiman, L., J. Friedman, R. Olsen, and C. Stone (1984), Classification a
Wadsworth & Brooks Pacific Grove CA
-
8/8/2019 2009WR008898-pip
42/61
X - 42 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERAT
Castelletti, A., M. De Zaiacomo, S. Galelli, M. Restelli, P. Sanavia, R. S
J. Antenucci (2009), An emulation modelling approach to reduce the co
hydrodynamic-ecological model of a reservoir, in Proceedings of Internatio
Environmental Software Systems (ISESS2009), October 2-9, Venice, I.
Cervellera, C., V. Chen, and A. Wen (2006), Optimization of a large-scale water
by stochastic dynamic programming with efficient state space discretization,
of Operational Research, 171 (3), 11391151.
Cohn, D., Z. Ghahramani, and M. Jordan (1996), Active learning with statistic
of artificial intelligence research, 4, 129145.
Ernst D (1999) Near optimal closed-loop control application to electric pow
-
8/8/2019 2009WR008898-pip
43/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPER
Galelli, S., and R. Soncini-Sessa (2010), Combining metamodelling and stoc
gramming for the design of reservoirs release policies, Environmental M
25(2), 209222.
Galelli, S., C. Gandolfi, R. Soncini-Sessa, and D. Agostani (2010), Build
an irrigation district distributed-parameter model, Agricultural Water
187200.
Gaskett, C. (2002), Q-learning for robot control, Ph.D. thesis, Australian N
Canberra, AUS.
Geurts, P., D. Ernst, and L. Wehenkel (2006), Extremely randomized trees
63 (1) 342
-
8/8/2019 2009WR008898-pip
44/61
X - 44 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERAT
Haimes, Y. (1977), Hierarchical Analyses of Water Resources Systems, McGra
NY.
Hall, W., and N. Buras (1961), The dynamic programming approach to wate
opment, Journal of Geophysical Research, 66(2), 510520.
Hall, W., W. Butcher, and A. Esogbue (1968), Optimization of the operation o
reservoir by dynamic programming, Water Resources Research, 4 (3), 4714
Heidari, M., V. Chow, P. Kokotovic, and D. Meredith (1971), Discrete di
programming approach to water resources systems optimisation, Water Re
7(2), 273282.
Hejazi M X Cai and B Ruddell (2008) The role of hydrologic information
-
8/8/2019 2009WR008898-pip
45/61
CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPER
Kelman, J., J. Stedinger, L. Cooper, E. Hsu, and S. Yuan (1990), Sampling
Programming applied to reservoir operation, Water Resources Research,
Labadie, J. (2004), Optimal operation of multireservoir systems: State-of-the
of Water Research Planning and Management - ASCE, 130(2), 93111.
Larson, R. (1968), State Incremental Dynamic Programming, American Else
Lee, J.-H., and J. W. Labadie (2007), Stochastic optimization of multir
reinforcement learning, Water Resources Research, 43(11), 116.
Luenberger, D. (1971), Cyclic dynamic programming: a procedure for proble
Operations Research, 19(4), 11011110.
Nardini A C Piccardi and R Soncini-Sessa (1992) On the integration
-
8/8/2019 2009WR008898-pip
46/61
X - 46 CASTELLETTI ET AL.: FITTEDQ-ITERATION FOR WATER RESERVOIR OPERAT
Piccardi, C., and R. Soncini-Sessa (1991), Stochastic dynamic programming
mal control: dense discretization and inflow correlation assumption made p
computing, Water Resources Research, 27(5), 729741.
Read, E. (1989), Dynamic Programming for Optimal Water Resources Systems
dual approach to stochastic dynamic programming for reservoir release schedu
Prentice-Hall, Englewood Cliffs.
Saad, M., and A. Turgeon (1988), Application of principal component ana
reservoir management, Water Resources Research, 24 (7), 907912.
Saad, M., A. Turgeon, , and J. Stedinger (1992), Censored-data correlation a
ponent dynamic programming Water Resources Research 28 (8) 2135214
-
8/8/2019 2009WR008898-pip
47/61
!! Please write \lefthead{} in file !!: !! Please write \righthead{
-
8/8/2019 2009WR008898-pip
48/61
X - 48!! Please write \lefthead{} in file !!: !! Please write \righthead{
-
8/8/2019 2009WR008898-pip
49/61
!! Please write \lefthead{} in file !!: !! Please write \righthead{
-
8/8/2019 2009WR008898-pip
50/61
-
8/8/2019 2009WR008898-pip
51/61
-
8/8/2019 2009WR008898-pip
52/61
-
8/8/2019 2009WR008898-pip
53/61
-
8/8/2019 2009WR008898-pip
54/61
-
8/8/2019 2009WR008898-pip
55/61
-
8/8/2019 2009WR008898-pip
56/61
-
8/8/2019 2009WR008898-pip
57/61
-
8/8/2019 2009WR008898-pip
58/61
-
8/8/2019 2009WR008898-pip
59/61
-
8/8/2019 2009WR008898-pip
60/61
-
8/8/2019 2009WR008898-pip
61/61