a parallel dynamic programming algorithm for multi-reservoir system

8/11/2019 A Parallel Dynamic Programming Algorithm for Multi-reservoir System

1/15

A parallel dynamic programming algorithm for multi-reservoir systemoptimization

Xiang Li a , Jiahua Wei a , Tiejian Li a , Guangqian Wang a , William W.-G. Yeh b ,

a State Key Laboratory of Hydroscience & Engineering, Tsinghua University, Beijing 100084, Chinab Department of Civil and Environmental Engineering, University of California, Los Angeles, CA 90095, USA

a r t i c l e i n f o

Article history:Received 8 May 2013Received in revised form 8 January 2014Accepted 12 January 2014Available online 30 January 2014

Keywords:Dynamic programmingMulti-reservoir system optimization Joint operationParallel computing

a b s t r a c t

This paper develops a parallel dynamic programming algorithm to optimize the joint operation of amulti-reservoir system. First, a multi-dimensional dynamic programming (DP) model is formulated fora multi-reservoir system. Second, the DP algorithm is parallelized using a peer-to-peer parallel paradigm.The parallelization is based on the distributed memory architecture and the message passing interface(MPI) protocol. We consider both the distributed computing and distributed computer memory in theparallelization. The parallel paradigm aims at reducing the computation time as well as alleviating thecomputer memory requirement associated with running a multi-dimensional DP model. Next, we testthe parallel DP algorithm on the classic, benchmark four-reservoir problem on a high-performance com-puting (HPC) system with up to 350 cores. Results indicate that the parallel DP algorithm exhibits goodperformance in parallel efciency; the parallel DP algorithm is scalable and will not be restricted by thenumber of cores. Finally, the parallel DP algorithm is applied to a real-world, ve-reservoir system inChina. The results demonstrate the parallel efciency and practical utility of the proposed methodology.

2014 Elsevier Ltd. All rights reserved.

1. Introduction

Dynamic programming (DP), an algorithm attributed largely toBellman [3] , is developed for optimizing a multi-stage (the termstage represents time step throughout the paper) decision pro-cess. If the return or cost at each stage is independent and satisesthe monotonicity and separability conditions [23] , the originalmulti-stage problemcan be decomposed into stages with decisionsrequired at each stage. The decomposed problem then can besolved recursively, two stages at a time, using the recursive equa-tion of DP. DP is particularly suited for optimizing reservoir man-agement and operation as the structure of the optimization

problem conforms to a multi-stage decision process. Over the pastfour decades, DP had been used extensively in the optimization of reservoir management and operation [4,6,8,13,22,35,3740] .

In the discrete form of DP, storage of each reservoir is discret-ized into a nite number of levels. By exhaustive enumeration overall possible combinations of discrete levels at each stage for all res-ervoirs in a system, global optimality can be assured in a discretesense. However, the well-known curse of dimensionality [2]

limits the application of DP to multi-state variable problems, asthe state space increases exponentially with an increase in thenumber of state variables. This drastic increase in state space andthe consequent random access memory (RAM) requirementquickly can exceed the hardware capacity of a modern computer[13] . A variety of DP variants, such as incremental dynamic pro-gramming (IDP) [15] , dynamic programming successive approxi-mations (DPSA) [14] , incremental dynamic programming andsuccessive approximations (IDPSA) [32] and discrete differentialdynamic programming (DDDP) [9] have been proposed to alleviatethe dimensionality problem. However these variants all require aninitial trajectory for each state variable. For a non-convex problem,

there is no assurance of convergence to the global optimum. Re-cently, Mousavi and Karamouz [22] reduced the computation timeof a DP model for a multi-reservoir systemby diagnosing infeasiblestorage combinations and removing them from further computa-tions. Zhao et al. [40] proposed an improved DP model for optimiz-ing reservoir operation by taking advantage of the monotonicrelationship between reservoir storage and the optimal releasedecision. However, the model only can be applied to reservoiroperation with a concave objective function.

Because of the hardware limitations of a single computer aswell as large-scale computing requirements, parallel computinghas been applied in many elds [25] . In the water resources eld,there are several successful examples. Bastian and Helmig [1]

http://dx.doi.org/10.1016/j.advwatres.2014.01.002

0309-1708/ 2014 Elsevier Ltd. All rights reserved.

Corresponding author. Tel.: +1 3108252300; fax: +1 3108257581.E-mail addresses: [email protected] , [email protected] (X. Li),

[email protected] (J. Wei), [email protected] (T. Li), [email protected] (G. Wang), [email protected] (W.W.-G. Yeh).

Advances in Water Resources 67 (2014) 115

Contents lists available at ScienceDirect

Advances in Water Resources

j o u rna l homepa ge : www.e l s ev i e r. com/ loca t e / advwa t r e s
http://dx.doi.org/10.1016/j.advwatres.2014.01.002mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.advwatres.2014.01.002http://www.sciencedirect.com/science/journal/03091708http://www.elsevier.com/locate/advwatreshttp://www.elsevier.com/locate/advwatreshttp://www.sciencedirect.com/science/journal/03091708http://dx.doi.org/10.1016/j.advwatres.2014.01.002mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.advwatres.2014.01.002http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://crossmark.crossref.org/dialog/?doi=10.1016/j.advwatres.2014.01.002&domain=pdfhttp://-/?-


2/15

employed a data parallel implementation of a Newton-multi-gridalgorithm for two-phase ow in porous media. Kollet and Maxwell[10] incorporated an efcient parallelism into an integrated hydro-logic model ParFlow. Tang et al. [31] used the masterslave andmulti-population parallelization schemes for the Epsilon-Nondom-inated Sorted Genetic Algorithm-II and applied them to severalwater resources problems. Wang et al. [33] designed a commonparallel framework, while Li et al. [16] introduced a dynamic par-allel algorithm for hydrological model simulations. Rouholahnejadet al. [27] proposed a parallelization framework for hydrologicalmodel calibration. Wu et al. [36] parallelized the Soil and WaterAssessment Tool. These previous studies demonstrated that byusing parallel computing associated with proper parallelizationstrategies for optimization or simulation, solutions can be im-proved and computation times can be reduced dramatically.Although parallel computing has been used extensively in hydro-logic models, the potential has not been explored fully in the eldof reservoir operation [29] .

Over the last several decades, with the rapid development of high-performance computing (HPC) environments, parallel dy-namic programming algorithm has been studied from theoriesgradually to applications. From the theoretical point of view,Casti et al. [5] presented various forms of parallelism and thecorresponding parallel algorithms for DP. Rytter [28] consideredsome DP problems and estimated the computation complexityand the number of computing processes based on a hypotheticalparallel random access machine, namely the shared memoryarchitecture. However, the shared memory architecture maynot be supportable for large RAM requirement problems, sinceall parallel tasks access the nite RAM in the architecture (be-cause of motherboard size restrictions). From the practical pointof view, El Baz and Elkihel [7] designed several load-balancingstrategies and applied a parallel DP algorithm with Open MP, aprotocol based on the shared memory architecture, to the 01knapsack problem. Martins et al. [19] and Tan et al. [30] parall-elized DP algorithms for a class of problems, such as biologicalsequence comparisons. Piccardi and Soncini-Sessa [24] studiedthe discretization and inow correlation on the solution reliabil-ity of a stochastic dynamic programming (SDP) and exploitedparallel computing to a one-reservoir optimal control problem;the parallelization was attributed largely to a vectorizing com-piler or parallelizing compiler. Li et al. [17] implemented aknowledge-based approach to accurately determine hydropowergeneration and developed a masterslave parallel DP algorithmon an HPC system to optimize the coordinated operation of the Three Gorges Project and Gezhouba Project cascade hydro-power plants in China. The masterslave is a frequently-usedparallel paradigm for parallelizing a DP algorithm, where themaster process has a full version of the DP algorithm, and callsslave processes to evaluate and return objective values and saves

all variables in the RAM of the master process. However, thisparallel paradigm merely shortens computation time but doesnot alleviate the RAM requirement.

In this paper, we develop a parallel DP algorithm for multi-reservoir system optimization. The parallel DP algorithm aimsat reducing the computation time and alleviating the RAMrequirement, taking advantage of parallel computing on the dis-tributed memory architecture. The paper is organized as follows:Section 2 formulates a DP model for a multi-reservoir systemoptimization problem; Section 3 analyzes the parallelism inher-ent to the DP algorithm and then proposes a peer-to-peer paral-lel paradigm for the DP algorithm; Section 4 considers the classicfour-reservoir example and tests the parallel DP algorithm onthe problem; Section 5 applies the parallel DP algorithm to a

real-world ve-reservoir system in China; nally, Section 6 pre-sents the conclusions.

2. Dynamic programming model

2.1. Objective function

A typical objective function for the optimal operation of a multi-reservoir system is either to maximize benet or minimize opera-tional cost. If more than one objective is involved, they can be com-

bined by the weighting method to form a composite objectivefunction. Without loss of generality, let us consider the maximiza-tion problem. According to Bellmans principle of optimality [3] ,the traditional forward recursive equation of an n-dimensionalDP model (the term dimension here refers to the number of res-ervoirs) can be represented as:

F t 1 S t 1 max f f t S t ; St 1 F t S t g 1

where t is the time index, t 2 1 ; T ; St is the storage vector at thebeginning of time step t ; St S 1 t ; . . .; S it ; . . .; S n t T ; St 1 isthe storage vector at the end of time step t ; i is the reservoir index,i 2 1 ; n ; F t is the maximum cumulative return from the rsttime step to the beginning of the t th time step resulting from the joint operation of n reservoirs; initially, F 1 0; F t 1 is themaximum cumulative return from the rst time step to the endof the t th time step resulting from the joint operation of n reser-voirs; and f t is the objective function to be maximized duringtime step t . Note that Eq. (1) is the invertedform of a DPmodel withreservoir storages as the decision variables. In the non-inverted DPmodel, the releases are the decision variables. For a deterministicDP model, the ending storage is related to the beginning storageby the continuity equation.

2.2. Constraints

In the operation of a multi-reservoir system, each individualreservoir is subject to its own set of constraints, while the reservoirsystem is subject to system constraints brought by the intercon-nection of reservoirs. Specically, we consider the followingconstraints:

Continuity equation:

S t 1 S t It M R t 8t 2

where It is the vector of inows to reservoirs ( i 1 ; . . .; n) duringtime step t ; R t is the vector of total releases from reservoirs(i 1 ; . . .; n) during time step t ; R t R1 t ; . . .; Rit . . .; Rn t T ;and M is the n n reservoir system connectivity matrix. Withoutloss of generality, we assume that evaporation loss is balanced byprecipitation.

Initial and nal reservoir storages:S 1 S initial 3

S T 1 P S final 4

where Sinitial and Sfinal are the vectors of initial storages and nal ex-pected storages of reservoirs ( i 1 ; . . .; n).

Lower and upper bounds on storages:S min t 1 6 St 1 6 S max t 1 8t 5

where Smin t 1 and Smax t 1 are the vectors of minimum andmaximumstorages of reservoirs ( i 1 ; . . .; n) attheendof time stept .

Lower and upper bounds on releases:For all reservoirs:

R min t 6 R t 6 R max t 8t 6

where R min

t is the vector of the minimum required releases fromreservoirs ( i 1 ; . . .; n) during time step t ; and R max t is the vector

2 X. Li et al./ Advances in Water Resources 67 (2014) 115


3/15

of maximum allowable releases from reservoirs ( i 1 ; . . .; n) duringtime step t .

For reservoirs in parallel:

PRminl t 6 Xi2 U l Rit 6 PRmax

l t 8t ; 8 j 7

where l is the index of river conuence points; l 2 1 ; L and L is the

number of river conuence points; U l is the set of reservoirs in par-allel at river conuence point l; and PRminl t and PRmaxl t are the

minimum and maximum releases at river conuence point l duringtime step t .

Lower and upper bounds on outputs:For all reservoirs:

N min t 6 N t 6 N max t 8t 8

where Nt is the vector of outputs produced from reservoirs(i 1; . . .; n) during time step t ; Nt N 1 t ; . . .; N it . . .; N n t

T ;Nmin t is the vector of minimum required outputs from reservoirs(i 1; . . .; n) during time step t ; and Nmax t is the vector of maxi-mum allowable outputs from reservoirs ( i 1 ; . . .; n) during timeperiod t .

For the entire reservoir system:

N min t 6 Xn

i1N it 6 N

max t 8t 9

where N min t is the minimum required output from the reservoirsystem during time step t ; and N max t is the maximum allowableoutput from the reservoir system during time step t .

Note that reservoir storages, releases and outputs are variablesto be solved; the others are the known input data in theoptimization.

3. Parallelization strategy

In this study, we use the Windows HPC Server 2012 R2 operat-ing system (OS). The HPC system consists of 20 IBM HS22 bladesand an Inniband 40 GBps network. Each blade has two IntelXeon E5645 2.40 GHz CPUs (each CPU has six physical cores)and 12 GB of RAM. The Intel Xeon CPU employs hyper-threadingtechnology that includes key features: (1) for each physical core,the OS addresses two logical cores and can schedule two processessimultaneously (i.e. a logical core conducts one process). Thusthere are a total of 480 logical cores in the HPC system; (2) eachlogical core within a physical core has its own execution resourcesand also shares execution resources with the other logical core; (3)when two processes on two logical cores belong to the same phys-ical core and need same shared execution resources, one processshould stall and give the execution resources to the other processuntil it nishes. In other words, the two logical cores within a

physical core cannot work as two independent physical cores.Fig. 1 presents the schematic representation of the HPC system,which is the distributed memory architecture.

Section 3.1 establishes the foundation for using the serial DPalgorithm to solve the multi-reservoir system optimization prob-lem. We summarize the procedures of DP algorithm, illustratethe DP algorithms solution space with the help of a matrix, convertthe vector form Eq. (1) into a scalar form, and estimate the compu-tation time and RAM requirement for a multi-reservoir DP prob-lem. Section 3.2 analyzes the concurrency and dependency of executing the DP algorithm and employs a peer-to-peer parallelparadigm to parallelize the DP algorithm. The aim of this paradigmis to shorten computation time as well as alleviate the RAMrequirement. We develop the parallel programs using C++ lan-

guage under the Microsoft Visual Studio 2010 platform and themessage passing interface (MPI) protocol [21] , allowing us to coor-

dinate a parallel program running on multiple computing pro-cesses in the distributed memory architecture.

3.1. Preparation for parallelization

In general, the solution of the DP algorithm includes two proce-dures [5,38] :

Procedure I : Solve the recursive equation (1) sequentially, twostages (corresponding to time steps) at a time, save the optimaltransitions or candidate paths (i.e. St ! St 1) in the memory,and save the maximum cumulative returns (i.e. F t 1 St 1) inthe memory for use in the next time step. The procedure continuesuntil the nal time step is reached.

Procedure II : Trace back the optimal path, based on the savedoptimal transitions, from the nal time step to the rst time stepto determine the consequent storage trajectories (i.e.ST 1 ! ! S2 ! S1) and release trajectories (i.e.R T ! ! R 2 ! R 1).

For this analysis, we perform a conversion for Eq. (1) . Assumingthe storage of each reservoir is discretized into m levels, the num-ber of storage combinations of all interconnected reservoirs is m n

at any time step (see Fig. 2 ). Note that storage is dened eitherat the beginning or the end of a time step. For the sake of clarity,we introduce C pt ; t to denote a storage combination of all inter-connected reservoirs in a system at the beginning of time step t ,where pt denotes the serial number of a storage combination atthe beginning of time step t , p t 2 1 ; mn . Thus all possible storagecombinations of all reservoirs over the entire planning horizoncan be expressed abstractly as:

C

C 1 ; 2 C 1 ; t C 1 ; T 1 ... . .

. ...

q ..

.

..

. C pt ; t

...

..

.q

... . .

. ...

C mn ; 2 C mn ; t C mn ; T 1

26666666664

37777777775m

n T

10

where C is the mn T matrix; C C2; . . .; Ct ; . . .; CT 1 ;C p1 ; 1 is the initial storage combination, often xed; C

is denotedas the m n T matrix of optimal transitions or candidate paths, usedto save the optimal previous storage combinations to the currentones so as to trace back the optimal path; and C s elementC pt 1 ; t 1 saves the optimal storage combination at the begin-ning of time step t to storage combination p t 1 at the end of timestep t . In addition, all maximum cumulative returns from the rsttime step to the beginning of the t th time step are denoted as Ft ,where Ft i s the mn 1 matrix. By substituting C pt ; t ( pt 1 ; . . .; m n ) for St , the vector formEq. (1) canbe rewritten intoan equivalent scalar form Eq. (11) :

F t 1 C pt 1 ; t 1 max f f t C pt ; t ; C pt 1 ; t 1 F t C pt ; t g 11

Fig. 3 schematically illustrates the computing procedures andcomputer memory used for the serial DP algorithm. Fig. 4 showsthe two computing procedures of the serial DP algorithm, basedon the scalar form Eq. (11) . First, the DP algorithm executes Proce-dure I : For a given pt 1 , to determine the maximum cumulative re-turn F t 1 C pt 1 ; t 1 and optimal transition C pt 1 ; t 1, theobjective function f t of all possible transitions must be examinedbetween C pt ; t ( pt 1 ; 2 ; . . .; m n ) and C pt 1 ; t 1 and F t C pt ; t ( pt 1 ; 2 ; . . .; mn ) has to be added as well. Then, the maximumcumulative return F t 1 C pt 1 ; t 1 and optimal transitionC pt 1 ; t 1 are saved in computer memory. After all maximum

cumulative return F t 1 C pt 1 ; t 1 ( pt 1 1 ; 2 ; . . .; mn

) and opti-mal transition C pt 1 ; t 1 ( pt 1 1 ; 2 ; . . .; mn ) are derived at the

X. Li et al. / Advances in Water Resources 67 (2014) 115 3
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


4/15

Fig. 1. Schematic representation of the HPC system.

Fig. 2. Conversion from reservoir storages to reservoir storage combinations.

Fig. 3. Schematic illustration of computing procedures and computer memory used for the serial DP algorithm.



5/15

end of timestep t , the algorithm proceeds to the next time step un-til the end of the planning horizon. Then the DP algorithm carries-out Procedure II : The optimal storage transition for each reservoiras well as the corresponding release policy can be traced by a back-ward sweep.

The computation time of the DP algorithm, mainly from Proce-dure I , can be approximately estimated as:

s 1 m2 n D s T 12

where s 1 is the wall clock time of using one computing process;there are a total number of m 2n time evaluations for Eq. (1) at eachtime step; all evaluations for Eq. (1) are assumed to require thesame average wall clock time D s . It should be noted that this isan upper bound estimation that includes the possible infeasibletransitions, which are discarded in the actual computation. Theinfeasible transitions are the transitions that cannot be reacheddue to insufcient inow to a reservoir [22] .

On the other hand, Procedure I requires large RAM capacity. Forillustration, we consider the smallest amount of RAM to occupywhen running the DP algorithm on a single computer, as shownin Fig. 3 (right). There are two main types of variables that needto be saved in the sequential decision-making process: one is inthe form of integer variables, used to save the optimal transitions

C in a two-dimensional array, with the horizontal direction repre-senting the time step and the vertical direction representing thestorage combinations. The other is in the form of oating variables,used to save the maximum cumulative returns Ft and Ft 1 for timesteps t and t 1 in two one-dimensional arrays, respectively. Itshould be noted that the maximum cumulative returns at timesteps t and t 1 are updated until the nal time step is reached;that is tosay, Ft 1 ,Ft 2 , . . .,F1 are no longer saved in the RAM duringtime step t (see Fig. 3 ). For the sake of simplicity, we assume thetwo main variables occupy the same amount of RAM, U bytes. Thusthe DP algorithms total RAM amount simply can be expressed as:

RAM 1 mn T 2 U 13

where RAM 1 is the RAM amount running the DP algorithm on onecomputing process (byte).

From Eqs. (12) and (13) , it can be seen that computation timeand computer memory are proportional to mn for a multi-reservoirDP problem, and for this reason the curse of dimensionality issuearises.

3.2. Peer-to-peer parallel paradigm

In order to apply the DP algorithm to multi-reservoir systemoptimization, it is necessary to develop an effective parallelizationstrategy for the DP algorithm in order to shorten computation timeas well as alleviate the RAM bottleneck. The RAM bottleneck wouldmake the DP algorithm un-implementable when a single comput-ing process is used or several computing processes are used on theshared memory architecture.

The purpose of parallelization is to decompose the original taskinto several subtasks. Moreover, the parallelization distributes thetotal RAM associated with the task into several subtasks, each of which requires less RAM. Fig. 5 illustrates the way we distributethe above-mentioned two types of variables among K computingprocesses. We believe that the distribution only can be made alongthe vertical direction since the DP model features accumulationtime step by time step (refer to Eq. (1) ). Parallelization is basedon the peer-to-peer parallel paradigm, consisting of two types of

computing processes, i.e. K peer processes and one transfer pro-cess. Each peer process is in charge of sub-allocation of the totalRAM. The total number of storage combinations mn are distributedamong K peer processes, with the number of storage combinationsmk allocated to peer process k, k 2 1 ; K , yielding m n P

K k1 mk . For

the sake of clarity, let C C1 ; . . .; Ck; . . .; CK T and Ft F1 ;t ; . . .;

Fk;t ; . . .; FK ;t T , as shown in Fig. 5 .

Parallelization should consider the concurrency and depen-dency among subtasks undertaken by the K peer processes. Here,the term concurrency refers several subtasks that can be exe-cuted simultaneously on multiple computing processes. The termdependency refers to a computing process that can perform asubtask only after the other computing processes have nishedcertain subtasks. As far as concurrency is concerned, during time

step t the computation of each optimal transition, sayC pt 1 ; t 1, is independent of the others in the DP algorithm;

Fig. 4. Flowcharts of Procedure I and Procedure II of the serial DP algorithm.

http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


6/15

that is, the computations of all C pt 1 ; t 1 pt 1 1 ; 2 ; . . .; mn canbe executed simultaneously. As far as dependency is concerned, tocompute C pt 1 ; t 1 during time step t , all optimal transition

C pt ; t pt 1 ; 2 ;. . .

; mn

and maximum cumulative returnsF t C pt ; t pt 1 ; 2 ; . . .; m n , which are saved among K peer pro-cesses in this paradigm (see Fig. 5 ), should be known prior. In otherwords, a peer process cannot accomplish the computation forC pt 1 ; t 1 unless all peer processes have nished all computa-tions for C pt ; t pt 1 ; 2 ; . . .; mn and F t C pt ; t pt 1 ; 2 ; . . .; m n .Thus there is an underlying synchronization during each time stepin this paradigm.

The two processes have basic functions. Each peer process eval-uates a subtask of the current maximum cumulative returns andoptimal transitions based on the previous maximum cumulativereturns completed by the K peer processes and after the evalua-tions the subtask of current maximum cumulative returns andthe optimal transitions are saved in the RAM. The transfer processis in charge of communicating maximum cumulative returnsamong all peer processes and tracing back the optimal path.

Figs. 6 and 7 show the two procedures ( Procedure I and ProcedureII ) of the parallel DP algorithm. Unlike the procedures of serial DPalgorithm in Fig. 4 , some inter-process communicating statementsare added between the peer processes and the transfer process.

Fig. 6 presents the owchart of peer process k and the transferprocess of Procedure I of the parallel DP algorithm. Peer process j isany one ofthe K peer processes that hasthe same workowsas peerprocess k, andpeer process k or the transferprocessreceivethe mes-sageof peerprocess jbasedontherstcome,rstservedprinciple;and w is a countervariable.Themorespecicworkows of ProcedureI of the parallel DP algorithm include the following steps:

(1) Start to allocate a sub-allocation of the total RAM for allprocesses;

(2) When t 1, for each peer process, say peer process k, initial-ize Fk;2 and Ck2 under the given F 1 0 and C p1 ; 1;

(3) Each peer process, say peer process k, sends Fk;t 1 to the

transfer process at time step t ;(4) The transfer process receives F j;t 1 from a certain peer pro-cess j based on the principle of rst come, rst servedand sends F j;t 1 to all the peer processes;

(5) Repeat Step (4) until F j;t 1 from each of the peer processes(i.e. j 1 ; . . .; K ) is received and sent by the transferprocess;

(6) Each peer process, say peer process k, receives F j;t at timestep t 1 (i.e. F j;t 1 at time step t ) from the transfer processbased on the principle of rst come, rst served and car-ries-out the recursive equation to compute and updateFk;t 1 and Ckt 1 based on the sums of the objective func-tion resulting from the transitions from C jt to Ckt 1 andF j;t ;

(7) Repeat Step (6) until F j;t

is received K times from the transferprocess (i.e. j 1 ; . . .; K );

(8) Repeat Steps (3)(7) until the nal time step T is reached.

After Procedure I , each peer process, say peer process k, saves Ck ,Fk;T and Fk;T 1 in its RAM. Then, Procedure II of the parallel DP algo-rithm begins, as shown in Fig. 7 , and the workows include the fol-lowing steps:

(1) Each peer process, say peer process k, sends Fk;T 1 andCkT 1 to the transfer process;

(2) The transfer process receives F j;T 1 and C j T 1 from acertain peer process j based on the principle of rst come,rst served and compares the elements in F j;T 1 andupdates the maximum F T 1 C pT 1 ; T 1 and consequentC pT 1 ; T 1;

Fig. 5. Schematic illustration of computing procedures and computer memory used for the parallel DP algorithm.



7/15

Fig. 6. Flowchart of peer process k and transfer process of Procedure I of the parallel DP algorithm.

Fig. 7. Flowchart of peer process k and transfer process of Procedure II of the parallel DP algorithm.

http://-/?-http://-/?-


8/15

(3) Repeat Step (2) until F j;T 1 and C j T 1 from all peer pro-cesses (i.e. j 1 ; . . .; K ) are received and the maximumF T 1 C pT 1 ; T 1 and consequent C pT 1 ; T 1 arederived;

(4) If t > 1, go to Step (5); otherwise go to Step (10);(5) Identify the peer process that saves C pt ; t based on

C pt 1 ; t 1 and let X 1 (where X is used to judge whichchoice the peer process should make) for the peer process,X 2 for the other peer processes, and W 0 (where W isused to judge whether to end the transfer process);

(6) The transfer process sends X to all the peer processes;(7) Each peer process, say peer process k, receives X from the

transfer process. If X 1, the peer process should sendCkt to the transfer process and then repeat to receive Xfor the next time step. If X 2, the peer process directlyrepeats to receive X for the next time step;

(8) The transfer process receives the required C j t , which savesthe C pt ; t at time step t (i.e. C pt 1 ; t 1 at time stept 1) from the required peer process, say peer process j;

(9) Repeat Steps (4)(8);(10) Let X 3 for all peer processes and let W 1;(11) Each peer process ends its process, once receive X 3 from

the transfer process;(12) The transfer process ends its process when W 1.

The optimal path is derived by the transfer process with Proce-dure II. Furthermore, the consequent storages and releases of mul-tiple reservoirs also can be determined for all time steps with thederived optimal path.

We consider a straightforward workload balancing strategy todecrease the idle computing processes and further obtain goodperformance on parallel efciency. The strategy is to approxi-mately allocate the same amount of subtask to each peer process.Thus the numberof storage combinations saved in a peer process isdened as:

mk a 1

k6

bmk a b < k 6 K 14

where

a umn=K 15

b v mn ; K 16

and where u is the oor integer function (e.g. u5=3 1) andm is the remainder function (e.g. v 5 ; 3 2). However, we notethat the discarded infeasible solutions (see Section 3.1 ) still resultin imbalance tasks assigned to all peer processes.

The computation time of using the parallel DP algorithmis esti-mated as:

sK s0 s00 s000=K 17

where s K is the wall clock time of using K peer processes, consistingof the computation time fraction s 0, the communication time frac-tion s 00 and the workload imbalance time cost s 000. We employ theparallel efciency E K as a measurement to evaluate the parallel per-formance, calculated as:

E K s1 =K sK 18

The parallel efciency of a parallel programis dependent on the ra-tio between the communication time and the computation time. If the ratio is small, the parallel efciency is high; otherwise the par-allel efciency is low.

The RAM of the parallel DP algorithm allocated to each peerprocess, say peer process k, is estimated as:

RAM k mn T 2 U =K 19

where RAM k is the RAM amount for each peer process (byte). For asystem of distributed memory architecture, such as Fig. 1 , allcomputing processes in a blade share RAM. Suppose H computingprocesses share RAM whose size is RAM bytes. When using theparallel DP algorithm, we roughly can determine the upper boundof RAM by the following equation:

mn T 2 U =K H 6 RAM 20

From Eq. (20) , we see that the RAM requirement for a multi-reservoir DP problem can be alleviated by the number of comput-ing processes ( K ). This is a breakthrough in that multi-reservoirsystem optimization problems that previously could not be solvedby a serial DP algorithm on a single computer because of RAMrequirements now can be solved with parallel computing.However, we note that an increase in reservoir number will resultin an increase in computing process demand.

4. The four-reservoir example

4.1. Problem description

We rst apply the developed parallel DP algorithm to the clas-sic, hypothetical four-reservoir problem (see Fig. 8 ). This problemhas been studied in the literature by several researchers. Larson[15] solved the problem by IDP. Heidari et al. [9] solved the sameproblem by DDDP. More recently, Wardlaw and Sharif [34] useda genetic algorithm while Kumar and Reddy [12] used particleswarm optimization to evaluate the performance of their heuristictechniques.

From Fig. 8 , we see that the reservoir system consists of bothseries and parallel connections ( L 1, U 1 1 ; 3 ). The reservoirsystem connectivity matrix is:

M

1 0 0 0

0 1 0 0

0 1 1 01 0 1 1

26664

37775

21

The inows are assumed to be constant for the entire operatingperiod and are set as:

I

2

3

0

0

2666437775

22

Fig. 8. The four-reservoir problem.



9/15

The main objectives of the system operation are to maximizethe benets from hydropower generation from reservoirsi 1 ; 2 ; 3 ; 4 as well as the benets from irrigation from reservoiri 4 over 12 time steps ( n 4, T 12). We expand the objectivefunction f t to be maximized during time step t to:

f t X

4

i1

bit Rit b5 t R4 t t < 12

X4

i1bit Rit b5 t R4 t X

4

i1 g iS i13 ; di t 128>>>>>>>: 23

where Rit (i 1; 2 ; 3 ; 4) can be computed directly from Eq. (2)once S it and S it 1 (i 1 ; 2 ; 3 ; 4) are chosen; bit is the benetfunction (refer to [15] ); and g i is the penalty function for notmeeting the ending storage constraint ( d i):

g iS i13 ; di 40 S i13 di

2 S i13 6 d i0 S i13 > di( 24

Other constraints include:

S min

00

0

0

2666437775

; S max

1010

10

15

2666437775

25

S initial

5

5

5

5

2666437775

; S final

5

5

5

7

2666437775

26

R min

0

0

0

0

26664

37775

; R max

3

4

4

7

26664

37775

27

Note that Eqs. (7)(9) are inactive in this example.

4.2. Results and discussion

In this example, the storage in each reservoir is discretized withD S 1 unit, and thus the total number of storage combinations is11 11 11 16 = 21,296. We solved the four-dimensional DPmodel on the HPC systemdescribed above. First, we applied the se-rial DP algorithmwith a single computing process. The global opti-mum is 401.3. The wall clock time ( T 1 ) was 1818.6 s. Then weperformed several executions using the developed parallel DPalgorithm by varying the peer process numbers. As expected, all

executions produced the same global optimum, except with differ-ent wall clock times. The computation time fraction s 0, the sum of communication time fraction and workload imbalance time costs 00 s 000, the wall clock time s K and the parallel efciency E K arepresented in Table 1 . As we can see, the wall clock time T K is dras-tically reduced, from 1818.6 s to 9.7 s with 350 peer processes. Thisreduction is quite substantial.

Several probable reasons affect the parallel efciency:

(1) The hyper-threading technology. (a) If there are one peerprocess and one transfer process, we can conclude fromthe result that the OS schedules the two processes on onephysical core. The peer process does not compete with thetransfer process for execution resources, and thus the com-putation time fraction is almost the same with the serialDP algorithm. (b) If there are two peer processes and one

transfer process, we conclude that one peer process andthe transfer process are scheduled on one physical core,while the other peer process is scheduled on the other phys-ical core. Because the three processes do not compete forexecution resources, the parallel efciency is as high as0.99. (c) If there are three peer processes and one transferprocess, there would be two peer processes scheduled onthe same physical core.Because the two peer processes com-pete for execution resources, the parallel efciency is

reduced to 0.72. (d) By this reasoning, parallel efcienciesincrease when there are even numbers of peer processesand decrease when there are odd numbers of peer processes.However, as the number of peer processes becomes large,the uctuations are not obvious.

(2) The workload imbalance. From Fig. 9 , we see that as thenumber of peer processes increases, the sum of communica-tion time fraction and workload imbalance time costincreases, then decreases, and then increases again. This isbecause when the number of peer processes is small, theworkload imbalance (because various numbers of infeasiblesolutions assigned to the peer processes will be discarded,[see Section 3.1 ]) is signicant, and the fast peer processesmust wait until the slowest peer process nishes its task.

However, when the number of peer processes becomeslarge, the amounts of task assigned to peer processesdecreases and the inuence of workload imbalance dimin-ishes. Although there are still imbalance tasks among vari-ous peer processes, the fast peer processes do not have towait long. Because the sum of communication time fractionand workload imbalance time cost increases linearly as thenumber of peer processes increases, we can infer that thedeveloped parallel DP algorithm is scalable and notrestricted by the increase in the number of cores.

(3) The turbo boost technology (referred to as dynamic overc-locking). The clock rate depends on the CPUs thermal limit,the number of cores in use as well as the maximum fre-quency of the active cores. If the CPU is below its thermallimits, the operating frequency will increase; otherwise theoperating frequency will be xed on the standard frequency.

Table 1

The computation time fraction, the sum of communication time fraction andworkload imbalance time cost, the wall clock time and the parallel efciency of theparallel DP algorithm for the classic four-reservoir problem.

No. of peer processes s 0 (s) s 00 s 000(s) s K (s) E K

Serial 1818.6 1.001 1836.2 0.0 1836.2 0.992 1833.8 1.3 917.5 0.99

3 2427.5 100.6 842.7 0.724 2226.3 123.4 587.4 0.775 2595.5 115.2 542.1 0.676 2373.6 181.3 425.8 0.717 2688.3 140.5 404.1 0.648 2510.6 194.1 338.1 0.679 2732.9 132.6 318.4 0.63

10 2593.3 153.7 274.7 0.6625 2948.3 90.9 121.6 0.6050 2987.1 60.6 61.0 0.6075 3016.1 33.3 40.7 0.60

100 3036.3 93.7 31.3 0.58125 3073.2 75.3 25.2 0.58150 3058.1 106.0 21.1 0.57175 3066.6 140.8 18.3 0.57200 3059.8 134.0 16.0 0.57225 3115.4 147.1 14.5 0.56

250 3087.7 197.5 13.1 0.55275 3106.8 210.5 12.1 0.55300 3113.3 216.7 11.1 0.55325 3080.3 250.6 10.2 0.55350 3137.0 243.0 9.7 0.54

http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


10/15

5. A real-world application

5.1. Problem description

We now apply the developed parallel DP algorithm to a real-world reservoir system located in the central Yangtze River Basin,China (see Fig. 10 ). The system consists of ve reservoirs ( n 5):the Three Gorges Project (TGP) and the Gezhouba (GZB) on theYangtze River; and the Shuibuya (SBY), the Geheyan (GHY) and

the Gaobazhou (GBZ) on the Qingjiang River (which joins the Yan-gtze River at the town of Zhicheng). As shown in Fig. 10 (right),these reservoirs are numbered from i 1 to i 5 and the riverconuencepoint is denotedas l 1 ( U 1 2 ; 5 ). The natural inowbetween the TGP and the GZB can be ignored because of their closeproximity. At present, all ve reservoirs areoperated by two corpo-rations. The TGP and GZB cascade hydropower plants (TGPGZB)are managed by the China Three Gorges Corporation, while theSBY, GHY and GZB cascade hydropower plants (SBYGHYGBZ)are under the jurisdiction of the Hubei Qingjiang HydroelectricDevelopment Co., Ltd. The joint operation of this ve-reservoir sys-tem is of major interest to the Ministry of Science and Technologyof China.

In this reservoir system, the SBY is a multi-year storage reser-

voir with 24.0 108

m3

of active storage capacity; the GHY is ayearly storage reservoir with 11.5 10 8 m 3 of active storage

capacity; the TGP is a seasonal storage reservoir with221.5 10 8 m 3 of active storage capacity; and the GZB and GBZare daily storage reservoirs with active storage capacities0.8 10 8 m 3 and 0.5 10 8 m 3 , respectively much smaller thanthe other three reservoirs. The main characteristics and the operat-ing rules of the ve reservoirs are listed in Table 2 , whereS max S min denotes the active storage capacity; H is theconversion function from reservoir storage to water level; N min

denotes the guaranteed output production of a hydropower plant;

and N max

denotes the installed capacity of a hydropower plant.

Fig. 9. Computation time fraction versus the sum of communication time fraction and workload imbalance time cost.

Fig. 10. The ve-reservoir system.

Table 2

Main characteristics and operating rules of the ve reservoirs.

Reservoir TGP GZB SBY GHY GBZ

S min (10 8 m 3) 171.5 6.3 19.0 18.7 3.5

S max (10 8 m 3) 393.0 7.1 43.0 30.2 4.0

S max S min (10 8 m 3) 221.5 0.8 24.0 11.5 0.5

H S min (m) 145.0 63.0 350.0 180.0 78.0

H S max (m) 175.0 66.0 400.0 200.0 80.0

Rmin (m 3 /s) 6000 6000 0 0 0

Rmax (m 3 /s) 101,700 113,400 13,200 18,000 18,400

N min (MW) 4990 1040 310 241.5 77.3

N max (MW) 22,400 2757 1840 1212 270



11/15

We use three scenarios for the case study, as shown in Table 3 ,where PRmin and PRmax , respectively, denote the mandatory releaseand the maximum allowable release at river conuence point l 1.Scenario 1 is built for the TGPGZB; scenario 2 is for the SBYGHYGBZ; and scenario 3 encompasses the entire ve-reservoir system.Scenario 3 is conducted under the assumption that the ve-reser-voir system is under joint operation. To test our proposed method-ology, we further assume that energy generation is transmitted tothe same grid. This assumption may differ slightly from what actu-ally occurs in practice. Furthermore, for conuence point l 1, the

Table 3

Three scenarios used in the real-world problem.

Scenario System PRmin (m 3 /s) PRmax (m 3 /s) N min (MW) N max (MW)

1 TGPGZB 6000 56,700 6030 20,9572 SBYGHYGBZ 0 18,400 628.8 33223 The ve-reservoir system 6000 56,700 6658.8 24,279

Table 4

The optimal energy generation from the three scenarios (10 8 KW h).

Scenario TGP GZB SBY GHY GBZ Total

1 8364.9 1528.8 9893.72 330.5 274.9 87.8 693.23 8411.9 1525.0 342.1 271.5 86.0 10636.5(3-1-2)

*

47.0 3.8 11.6 3.4 1.8 49.6*

Note : (3-1-2) denotes the scenario 3 minus scenario 1 minus scenario 2.

(a) Release at river confluence point 1l = from scenario 1.

(b) Release at river confluence point 1l = from scenario 3.

Fig. 11. Comparison of releases at river conuence point l 1, scenarios 1 versus 3.

http://-/?-http://-/?-http://-/?-


12/15

mandatory release is 6000 m 3 /s for both navigational and ecologi-

cal purposes, and the maximum allowable release is 56,700 m3

/sfor downstream ood protection.

The objectives of the three scenarios are to maximize energy

generation over a recent 10-year horizon (June 2000 throughMay 2010). The operating period is divided into 360 time steps

(a) Output production from scenario 1.

(b) Output production from scenario 2.

(c) Output production from scenario 3.

Fig. 12. Comparison of output productions, scenarios 1 and 2 versus scenario 3.



13/15

(each with a length of about 10 days and T 360). We expand theobjective function f t to be maximized during time step t to:

f t XI

i

N it D t XI

i

9 :81 gi R0it H it D t 8t 28

where

Rit R0it R00i t 8t 29

H it HF it HT it 8t 30

g i is the hydropower plant efciency at reservoir i; R0it and R

00i t

are the power release and non-power release from reservoir i dur-ing time step t ; H it is the average head at reservoir i during timestep t , which is the difference between the average reservoir fore-bay water level HF it (as a function of beginning and ending timeperiod reservoir storages) and the tail-race water level HT it (as afunction of total release) at reservoir i during time step t ; and D t is the time interval. In this case, each reservoir operates under itsown constraints (see Table 2 ), while the reservoir system operatesunder the system constraints (see Table 3 ).

5.2. Results and discussion

In the real-world case, the GZB and GBZ are treated as run-of-river hydropower plants because of their much smaller storagescompared with the TGP, SBY and GHY (see Table 2 ), which areoperated as storage reservoirs. The ratio of active storages of thethree reservoirs is approximately 20:2:1 (i.e. 221.5 10 8 m 3 :24.0 10 8 m 3 : 11.5 10 8 m 3 ), and thus we discretize them into200, 20 and 10 levels, respectively. The total numbers of storagecombinations are 200 1 = 200, 20 10 = 200 and200 20 10 = 40,000 for the three scenarios. Similarly, the com-putation was performed on the HPC system with Oracle 11g serv-ing as the database system for data access.

The optimal energy generations of the three scenarios areshown in Table 4 . The sum of optimal energy production of sce-nario 1 and scenario 2 is 9893.7 + 693.2 = 10586.9 10 8 KW h, lessthan the 10636.5 10 8 KW h of scenario 3. This indicates that thecoordinated operation of the ve reservoirs can result in an aver-age 4.96 10 8 KWh energy production increase (or about$1.24 10 8 CNY increase, assuming the energy price of the ve

reservoirs is $0.25 CNY/KW h [18] ) per year for the system. Thecomparisons of releases at river conuence point l 1 and the out-put productions between scenarios 1 and 2 versus scenario 3 areshown in Figs. 11 and 12 , respectively. Obviously, the SBYGHYGBZ system helps the TGPGZB system relieve the stress of watersupply demand at river conuence point l 1. The ve-reservoirsystem can provide the grid with more secure and reliable outputproduction.

For scenario 3, the serial DP algorithm took more than 10 days(T 1 ). Similar to the four-reservoir problem, we performed severalexecutions using the developed parallel DP algorithm by varyingthe number of peer processes. The computing procedures are com-pletely the same for various executions, and thus the objectivefunction value and the consequent storages and releases are the

same. The wall clock time T K and the parallel efciency E K for var-ious numbers of peer processes are shown in Table 5 . As we can

see, the wall clock time T K is reduced from 266.83 h to 1.54 h with350 peer processes. Again, this reduction is quite substantial. It isclear that an increase in the number of peer processes is accompa-nied by a decrease in wall clock time. Althougheach peer process isallocated the same approximate amount of subtask, workloadimbalance still exists, which may result in a little less efciencyrelative to the hypothetical problem.

6. Conclusions

In this paper, we establish a multi-dimensional DP model foroptimizing the joint operation of a multi-reservoir system, consid-ering several indispensable constraints for individual reservoirsand the reservoir system. We illustrate the DP algorithms solutionspace with the help of a matrix and estimate the smallest RAM re-quired by the DP algorithm, making full preparations for parallel-ization. We believe an effective parallelization strategy for the DPalgorithm shouldbe designedspecially to alleviate the RAM bottle-neck. For instance, the discretization should be sufciently ne foran accurate simulation in real-time operation. If we use a discret-ization level of 10 cm for each of the ve reservoirs in Section 5 and30 days as the inow forecast period, the RAM requirement will in-crease to approximately 2.09 TB when using the serial DP algo-rithm. In this situation, a single computer or the shared memoryarchitecture no longer can meet the large RAM requirement. Thedistributed memory architecture and the message passing inter-face (MPI) protocol make it possible for us to develop a parallelDP algorithm that considers both the distributed computing anddistributed computer memory, and further, to solve previouslyunsolvable multi-reservoir DP problems with parallel computing.In the above instance, the RAM requirement will be reduced toapproximate 2.09/ K TB to each peer process, where K is the num-ber of peer processes.

Using the developed parallel DP algorithm based on the peer-to-peer parallel paradigm, we solve the classic four-reservoir prob-lem and a real-world ve-reservoir system on a HPC system withup to 350 cores. The results indicate that the wall clock times arereduceddrastically when the number of computing processes is in-creased. In both cases, we observe good performance in parallelefciency. Furthermore, the real-world problem results indicatethat operational effectiveness can be improved and the benecialuses can be maximized by joint operation of the interconnected

ve reservoirs. Specically, (1) the energyproduction of the systemcan be increased by an average of 4.96 10 8 KWh per year; (2) thestress of meeting the minimum water supply demand at river con-uence point l 1 can be relieved greatly by the inclusion of theSBYGHYGBZ systemwith the TGPGZB system; and (3) more se-cure and reliable output productions can be guaranteed and trans-mitted to the grid.

The Casti et al. [5] paper sends a message of condence in con-structing computers with many processing elements to deal withsome future DP problems. Even at the time of that study (i.e.1970s), NASAs parallel computer had only 64 processing elements.Over last twenty years, however, we have witnessed the rapiddevelopment of supercomputing worldwide. According to TOP500 supercomputer sites historical lists ( http://www.top500.org/

), the number of cores in top supercomputers has increased fromseveral thousands to several millions, and computer memory has

Table 5

The wall clock time and the parallel efciency of the parallel DP algorithm for the ve-reservoir system.

No. of peer processes Serial 50 100 150 200 250 300 350

s K (h) 266.83 10.11 5.17 3.51 2.61 2.12 1.77 1.54E K 1.00 0.53 0.52 0.51 0.52 0.50 0.50 0.50

http://www.top500.org/http://-/?-http://www.top500.org/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


14/15

increased correspondingly. Given such advances, we predict thatthe barrier in computing resources may be inconsequential in thefuture. Moreover, we believe the benets resulting from hydro-power generation of a multi-reservoir system far exceed the ex-pense of computing resources. Thus, we hope this paper willstimulate future research on parallel computing in the eld of res-ervoir operation.

Future work should implement the parallel DP algorithm usingmassively parallel computing resources. Indeed, we have noticedthe trend to use massively parallel computing resources for somewater resources problems. For instance, Reed and Kollat [26] em-ployed a massively parallel multi-objective evolutionary algorithmfor a groundwater monitoring application with a maximum of 8192 processors; while Kollet et al. [11] and Maxwell [20] , respec-tively, performed a ParFlow of various coupled modes utilizing themaximum number of 16,384 processors.

Future work also could introduce the distributed database man-agement technique to further alleviate RAM requirements for mul-ti-reservoir DP problems. This paper distributes the optimaltransitions C and maximum cumulative returns Ft and Ft 1 andsaves them in RAMs of several computing processes. Future workcould distribute and save them in several databases in variouscomputers. This would take advantage of large hard disk capacitiesand access the required data from the database of identied com-puter when the DP algorithm solves the recursive equation andtraces back the optimal path.

Finally, the developed parallel DP algorithm easily can be ap-plied to other DP-based variants, such as SDP, IDP and DDDP.

Notation

t time index, t e [1, T ]i reservoir index, i e [1, n]l river conuence point index, l e [1, L]k peer process index, k e [1, K ]S(t ) reservoir storage vector at the beginning of time

step t , S(t ) = [ S 1 (t), ..., S i(t ), ... , S n(t )] T S(t + 1) reservoir storage vector at the end of time step t Sinitial initial reservoir storage vectorSnal nal expected reservoir storage vectorSmin (t + 1) minimum reservoir storage vector at the end of

time step t Smax (t + 1) maximum reservoir storage vector at the end of

time step t F t maximum cumulative return from the rst time

step to the beginning of the t th time stepresulting from the joint operation of n reservoirs

f t ( ) objective function to be maximized during timestep t

I(t ) inow vector during time step t

R (t ) total release vector during time step t ,R (t ) = [ R1 (t ), ..., Ri(t ),..., Rn(t )]T

R min (t ) minimum required release vector during timestep t

R max (t ) maximum allowable release vector during timestep t

g i hydropower plant efciency at reservoir iR0it power release from reservoir i during time step t R00i t non-power release from reservoir i during time

step t H it average head at reservoir i during time step t ,

which is equal to the average reservoir fore-baywater level HF i(t ) minus the tail-race water levelHT i(t )

D t time intervalM reservoir system connectivity matrixU l set of reservoirs in parallel at river conuence

point lPRminl t minimum required release at river conuence

point l during time step t PRmaxl t maximum allowable release at river conuence

point l during time step t N(t ) output vector during time step t ,N(t ) = [ N 1(t ), ..., N i(t ), ... , N n(t )]T

Nmin (t ) minimum required output vector during timestep t

Nmax (t ) maximum allowable output vector during timestep t

N min (t ) minimum required output from a reservoirsystem during time step t

N max (t ) maximum allowable output from a reservoirsystem during time step t

C possible storage combination mn T matrix,C = [C(2), ..., C(t ), ... , C(T + 1)]

C optimal transitions, C C1 ; . . .; Ck; . . .; CK T

Ft maximum cumulative returns from the rst timestep to the beginning of the t th time step,Ft F1 ;t ; . . .; Fk;t ; . . .; FK ;t

T

m number of discretization levels of each reservoirmk allocation number of storage combinations to

peer process ks 1 wall clock time of using 1 computing processs K wall clock time of using K peer processesD s average wall clock time for one-time objective

function evaluations 0 computation time fractions 00 communication time fractions 000 workload imbalance time costE K parallel efciency

RAM 1 RAM amount for a single computing process(byte)

RAM k RAM amount for each peer process (byte)

Acknowledgements

The research is supported by the National Key TechnologiesR&D Program #2013BAB05B03 and #2009BAC56B03 and the Na-tional Natural Science Foundation #51109114 in China. The rstauthor is supported by a fellowship from the Chinese governmentfor his visit to the University of California, Los Angeles. Partial sup-port is also provided from an AECOM endowment. The authors are

very grateful to the two anonymous reviewers for their in-depthreviews and constructive comments, which greatly helped improvethe paper.

References

[1] Bastian P, Helmig R. Efcient fully-coupled solution techniques for two-phaseowin porous media: parallel multigridsolution andlargescale computations.Adv Water Resour 1999;23(3):199216. http://dx.doi.org/10.1016/S0309-1708(99)00014-7 .

[2] Bellman R. Adaptive control processes: a guided tour. Princeton, NJ: PrincetonUniversity Press; 1961 .

[3] Bellman R. Dynamic programming. Princeton, NJ: Princeton University Press;1957 .

[4] Bhaskar NR, Whitlatch EE. Derivation of monthly reservoir release policies.Water Resour Res 1980;16(6):98793. http://dx.doi.org/10.1029/WR016i006p00987 .

http://dx.doi.org/10.1016/S0309-1708(99)00014-7http://dx.doi.org/10.1016/S0309-1708(99)00014-7http://refhub.elsevier.com/S0309-1708(14)00009-8/h0010http://refhub.elsevier.com/S0309-1708(14)00009-8/h0010http://refhub.elsevier.com/S0309-1708(14)00009-8/h0015http://refhub.elsevier.com/S0309-1708(14)00009-8/h0015http://dx.doi.org/10.1029/WR016i006p00987http://dx.doi.org/10.1029/WR016i006p00987http://dx.doi.org/10.1029/WR016i006p00987http://dx.doi.org/10.1029/WR016i006p00987http://dx.doi.org/10.1029/WR016i006p00987http://refhub.elsevier.com/S0309-1708(14)00009-8/h0015http://refhub.elsevier.com/S0309-1708(14)00009-8/h0015http://refhub.elsevier.com/S0309-1708(14)00009-8/h0010http://refhub.elsevier.com/S0309-1708(14)00009-8/h0010http://dx.doi.org/10.1016/S0309-1708(99)00014-7http://dx.doi.org/10.1016/S0309-1708(99)00014-7


15/15

[5] Casti J, Richardson M, Larson R. Dynamic programmingand parallel computers. J Optim Theory App 1973;12(4):42338. http://dx.doi.org/10.1007/BF00940421 .

[6] Chandramouli V, Raman H. Multireservoir modeling with dynamicprogramming and neural networks. J Water Resour Plann Manage2001;127(2):8998. http://dx.doi.org/10.1061/(ASCE)0733-9496(2001)127:2(89) .

[7] El Baz D, Elkihel M. Load balancing methods and parallel dynamicprogramming algorithm using dominance technique applied to the 01knapsack problem. J Parallel Distrib Comput 2005;65(1):7484. http://dx.doi.org/10.1016/j.jpdc.2004.10.004 .

[8] Hall WA, Butcher WS, Esogbue A. Optimization of the operation of a multiple-purpose reservoir by dynamic programming. Water Resour Res1968;4(3):4717. http://dx.doi.org/10.1029/WR004i003p00471 .

[9] Heidari M, Chow VT, Kokotovic PV, Meredith DD. Discrete differential dynamicprogramming approach to water resources systems optimization. WaterResour Res 1971;7(2):27382. http://dx.doi.org/10.1029/WR007i002p00273 .

[10] Kollet SJ, Maxwell RM. Integrated surface-groundwater ow modeling: a free-surface overland ow boundary condition in a parallel groundwater owmodel. Adv Water Resour 2006;29(7):94558. http://dx.doi.org/10.1016/ j.advwatres.2005.08.006 .

[11] Kollet SJ, Maxwell RM, Woodward CS, Smith S, Vanderborght J, Vereecken H,Simmer C. Proof of concept of regional scale hydrologic simulations athydrologic resolution utilizing massively parallel computer resources. WaterResour Res 2010;46(4). http://dx.doi.org/10.1029/2009WR008730 .

[12] Kumar DN, Reddy MJ. Multipurpose reservoir operation using particle swarmoptimization. J Water Resour Plann Manage 2007;133(3):192201. http://dx.doi.org/10.1061/(ASCE)0733-9496(2007)133:3(192) .

[13] Labadie JW. Optimal operation of multireservoir systems: state-of-the-artreview. J Water Resour Plann Manage 2004;130(2):93111. http://dx.doi.org/10.1061/(ASCE)0733-9496(2004)130:2(93) .

[14] Larson RE, Korsak AJ. A dynamic programming successive approximationstechnique with convergence proofs. Automatica 1970;6(2):24552. http://dx.doi.org/10.1016/0005-1098(70)90095-6 .

[15] Larson RE. State increment dynamic programming. New York: ElsevierScience; 1968 .

[16] Li T, Wang G, Chen J, Wang H. Dynamic parallelization of hydrological modelsimulations. Environ Modell Softw 2011;26(12):173646. http://dx.doi.org/10.1016/j.envsoft.2011.07.015 .

[17] Li X, Wei J, Fu X, Li T, Wang G. A knowledge-based approach for reservoirsystem optimization. J Water Resour Plann Manage, in press. doi: http://dx.doi.org/10.1061/(ASCE)WR.1943-5452.0000379 .

[18] LiX, Li T, Wei J, WangG, Yeh WWG. Hydro unitcommitment via mixed integerlinearprogramming: a case study of the three gorges project, China, IEEE TransPower Syst, in press. doi: http://dx.doi.org/10.1109/TPWRS.2013.2288933 .

[19] Martins WS, Del Cuvillo JB, Useche FJ, Theobald KB, Gao GR. A multithreadedparallel implementation of a dynamic programming algorithm for sequence

comparison. Pac Symp Biocomput 2001;6:31122 .[20] Maxwell RM. A terrain-following grid transform and preconditioner for

parallel, large-scale, integrated hydrologic modeling. Adv Water Resour2012;53:10917. http://dx.doi.org/10.1016/j.advwatres.2012.10.001 .

[21] Message passing interface forum. < http://www.mpi-forum.org/index.html >.[22] Mousavi SJ, Karamouz M. Computational improvement for dynamic

programming models by diagnosing infeasible storage combinations. AdvWater Resour 2003;26(8):8519. http://dx.doi.org/10.1016/S0309-1708(03)00061-7 .

[23] Nemhauser GL. Introduction to dynamic programming. New York: John Wiley;1966 .

[24] Piccardi C, Soncini-Sessa R. Stochastic dynamic programming for reservoiroptimal control: dense discretization and inow correlation assumption madepossible by parallel computing. Water Resour Res 1991;27(5):72941. http://dx.doi.org/10.1029/90WR02766 .

[25] Pool R. Massively parallel machines usher in next level of computing power.Science 1992;256:501. http://dx.doi.org/10.1126/science.256.5053.50 .

[26] Reed PM, Kollat JB. Visual analytics clarify the scalability and effectiveness of massively parallel many-objective optimization: a groundwater monitoringdesign example. Adv Water Resour 2013;56:113. http://dx.doi.org/10.1016/ j.advwatres.2013.01.011 .

[27] Rouholahnejad E, Abbaspour KC, Vejdani M, Srinivasan R, Schulin R, LehmannA. A parallelization framework for calibration of hydrological models. EnvironModell Softw 2012;31:2836. http://dx.doi.org/10.1016/ j.envsoft.2011.12.001 .

[28] Rytter W. On efcient parallel computations for some dynamic programmingproblems. Theor Comput Sci 1988;59(3):297307. http://dx.doi.org/10.1016/0304-3975(88)90147-8 .

[29] Sulis A. GRID computing approach for multireservoir operating rules withuncertainty. Environ Modell Softw 2009;24(7):85964. http://dx.doi.org/10.1016/j.envsoft.2008.11.003 .

[30] Tan G, Sun N, Gao GR. A parallel dynamic programming algorithm on a multi-core architecture. In: Proceedings of the nineteenth annual ACM symposiumon parallel algorithms and architectures (ACM 2007), 2007. p. 13544. http://dx.doi.org/10.1145/1248377.1248399 .

[31] Tang Y, Reed PM, Kollat JB. Parallelization strategies for rapid and robustevolutionary multiobjective optimization in water resources applications. AdvWater Resour 2007;30(3):33553. http://dx.doi.org/10.1016/ j.advwatres.2006.06.006 .

[32] Trott WJ, Yeh WWG. Optimization of multiple reservoir system. J Hydraul EngDiv 1973;99(10):186584 .

[33] Wang H, Fu X, Wang G, Li T, Gao J. A common parallel computing frameworkfor modeling hydrological processes of river basins. Parallel Comput2011;37(67):30215. http://dx.doi.org/10.1016/j.parco.2011.05.003 .

[34] Wardlaw R, Sharif M. Evaluation of genetic algorithms for optimal reservoirsystem operation. J Water Resour Plann Manage 1999;125(1):2533. http://dx.doi.org/10.1061/(ASCE)0733-9496(1999)125:1(25) .

[35] Wurbs RA. Reservoir-system simulation and optimization models. J WaterResour Plann Manage 1993;119(4):45572. http://dx.doi.org/10.1061/(ASCE)0733-9496(1993)119:4(455) .

[36] Wu Y, Li T, Sun L, Chen J. Parallelization of a hydrological model using themessage passing interface. Environ Modell Softw 2013;43:12432. http://dx.doi.org/10.1016/j.envsoft.2013.02.002 .

[37] Yakowitz S. Dynamic programming applications in water resources. WaterResour Res 1982;18(4):67396. http://dx.doi.org/10.1029/WR018i004p00673 .

[38] Yeh WWG. Reservoir management and operations models: a state-of-the-art

review. Water Resour Res 1985;21(12):1797818. http://dx.doi.org/10.1029/WR021i012p01797 .

[39] Young GK. Finding reservoir operating rules. J Hydraul Eng Div1967;93(HY6):297321 .

[40] Zhao T, Cai X, Lei X, Wang H. Improved dynamic programming for reservoiroperation optimization with a concave objective function. J Water ResourPlann Manage 2012;138(6):5906. http://dx.doi.org/10.1061/(ASCE)WR.1943-5452.0000205 .

http://dx.doi.org/10.1007/BF00940421http://dx.doi.org/10.1007/BF00940421http://dx.doi.org/10.1007/BF00940421http://dx.doi.org/10.1061/(ASCE)0733-9496(2001)127:2(89)http://dx.doi.org/10.1061/(ASCE)0733-9496(2001)127:2(89)http://dx.doi.org/10.1016/j.jpdc.2004.10.004http://dx.doi.org/10.1016/j.jpdc.2004.10.004http://dx.doi.org/10.1029/WR004i003p00471http://dx.doi.org/10.1029/WR007i002p00273http://dx.doi.org/10.1016/j.advwatres.2005.08.006http://dx.doi.org/10.1016/j.advwatres.2005.08.006http://dx.doi.org/10.1029/2009WR008730http://dx.doi.org/10.1061/(ASCE)0733-9496(2007)133:3(192)http://dx.doi.org/10.1061/(ASCE)0733-9496(2007)133:3(192)http://dx.doi.org/10.1061/(ASCE)0733-9496(2004)130:2(93)http://dx.doi.org/10.1061/(ASCE)0733-9496(2004)130:2(93)http://dx.doi.org/10.1061/(ASCE)0733-9496(2004)130:2(93)http://dx.doi.org/10.1016/0005-1098(70)90095-6http://dx.doi.org/10.1016/0005-1098(70)90095-6http://refhub.elsevier.com/S0309-1708(14)00009-8/h0075http://refhub.elsevier.com/S0309-1708(14)00009-8/h0075http://dx.doi.org/10.1016/j.envsoft.2011.07.015http://dx.doi.org/10.1016/j.envsoft.2011.07.015http://dx.doi.org/10.1061/(ASCE)WR.1943-5452.0000379http://dx.doi.org/10.1061/(ASCE)WR.1943-5452.0000379http://dx.doi.org/10.1109/TPWRS.2013.2288933http://refhub.elsevier.com/S0309-1708(14)00009-8/h0095http://refhub.elsevier.com/S0309-1708(14)00009-8/h0095http://refhub.elsevier.com/S0309-1708(14)00009-8/h0095http://refhub.elsevier.com/S0309-1708(14)00009-8/h0095http://dx.doi.org/10.1016/j.advwatres.2012.10.001http://www.mpi-forum.org/index.htmlhttp://dx.doi.org/10.1016/S0309-1708(03)00061-7http://dx.doi.org/10.1016/S0309-1708(03)00061-7http://dx.doi.org/10.1016/S0309-1708(03)00061-7http://refhub.elsevier.com/S0309-1708(14)00009-8/h0115http://refhub.elsevier.com/S0309-1708(14)00009-8/h0115http://dx.doi.org/10.1029/90WR02766http://dx.doi.org/10.1029/90WR02766http://dx.doi.org/10.1126/science.256.5053.50http://dx.doi.org/10.1016/j.advwatres.2013.01.011http://dx.doi.org/10.1016/j.advwatres.2013.01.011http://dx.doi.org/10.1016/j.envsoft.2011.12.001http://dx.doi.org/10.1016/j.envsoft.2011.12.001http://dx.doi.org/10.1016/0304-3975(88)90147-8http://dx.doi.org/10.1016/0304-3975(88)90147-8http://dx.doi.org/10.1016/0304-3975(88)90147-8http://dx.doi.org/10.1016/j.envsoft.2008.11.003http://dx.doi.org/10.1016/j.envsoft.2008.11.003http://dx.doi.org/10.1145/1248377.1248399http://dx.doi.org/10.1145/1248377.1248399http://dx.doi.org/10.1145/1248377.1248399http://dx.doi.org/10.1016/j.advwatres.2006.06.006http://dx.doi.org/10.1016/j.advwatres.2006.06.006http://refhub.elsevier.com/S0309-1708(14)00009-8/h0160http://refhub.elsevier.com/S0309-1708(14)00009-8/h0160http://refhub.elsevier.com/S0309-1708(14)00009-8/h0160http://dx.doi.org/10.1016/j.parco.2011.05.003http://dx.doi.org/10.1061/(ASCE)0733-9496(1999)125:1(25)http://dx.doi.org/10.1061/(ASCE)0733-9496(1999)125:1(25)http://dx.doi.org/10.1061/(ASCE)0733-9496(1993)119:4(455)http://dx.doi.org/10.1061/(ASCE)0733-9496(1993)119:4(455)http://dx.doi.org/10.1016/j.envsoft.2013.02.002http://dx.doi.org/10.1016/j.envsoft.2013.02.002http://dx.doi.org/10.1029/WR018i004p00673http://dx.doi.org/10.1029/WR021i012p01797http://dx.doi.org/10.1029/WR021i012p01797http://refhub.elsevier.com/S0309-1708(14)00009-8/h0195http://refhub.elsevier.com/S0309-1708(14)00009-8/h0195http://dx.doi.org/10.1061/(ASCE)WR.1943-5452.0000205http://dx.doi.org/10.1061/(ASCE)WR.1943-5452.0000205http://dx.doi.org/10.1061/(ASCE)WR.1943-5452.0000205http://dx.doi.org/10.1061/(ASCE)WR.1943-5452.0000205http://refhub.elsevier.com/S0309-1708(14)00009-8/h0195http://refhub.elsevier.com/S0309-1708(14)00009-8/h0195http://dx.doi.org/10.1029/WR021i012p01797http://dx.doi.org/10.1029/WR021i012p01797http://dx.doi.org/10.1029/WR018i004p00673http://dx.doi.org/10.1016/j.envsoft.2013.02.002http://dx.doi.org/10.1016/j.envsoft.2013.02.002http://dx.doi.org/10.1061/(ASCE)0733-9496(1993)119:4(455)http://dx.doi.org/10.1061/(ASCE)0733-9496(1993)119:4(455)http://dx.doi.org/10.1061/(ASCE)0733-9496(1999)125:1(25)http://dx.doi.org/10.1061/(ASCE)0733-9496(1999)125:1(25)http://dx.doi.org/10.1016/j.parco.2011.05.003http://refhub.elsevier.com/S0309-1708(14)00009-8/h0160http://refhub.elsevier.com/S0309-1708(14)00009-8/h0160http://dx.doi.org/10.1016/j.advwatres.2006.06.006http://dx.doi.org/10.1016/j.advwatres.2006.06.006http://dx.doi.org/10.1145/1248377.1248399http://dx.doi.org/10.1145/1248377.1248399http://dx.doi.org/10.1016/j.envsoft.2008.11.003http://dx.doi.org/10.1016/j.envsoft.2008.11.003http://dx.doi.org/10.1016/0304-3975(88)90147-8http://dx.doi.org/10.1016/0304-3975(88)90147-8http://dx.doi.org/10.1016/j.envsoft.2011.12.001http://dx.doi.org/10.1016/j.envsoft.2011.12.001http://dx.doi.org/10.1016/j.advwatres.2013.01.011http://dx.doi.org/10.1016/j.advwatres.2013.01.011http://dx.doi.org/10.1126/science.256.5053.50http://dx.doi.org/10.1029/90WR02766http://dx.doi.org/10.1029/90WR02766http://refhub.elsevier.com/S0309-1708(14)00009-8/h0115http://refhub.elsevier.com/S0309-1708(14)00009-8/h0115http://dx.doi.org/10.1016/S0309-1708(03)00061-7http://dx.doi.org/10.1016/S0309-1708(03)00061-7http://-/?-http://www.mpi-forum.org/index.htmlhttp://-/?-http://dx.doi.org/10.1016/j.advwatres.2012.10.001http://-/?-http://refhub.elsevier.com/S0309-1708(14)00009-8/h0095http://refhub.elsevier.com/S0309-1708(14)00009-8/h0095http://refhub.elsevier.com/S0309-1708(14)00009-8/h0095http://-/?-http://dx.doi.org/10.1109/TPWRS.2013.2288933http://-/?-http://dx.doi.org/10.1061/(ASCE)WR.1943-5452.0000379http://dx.doi.org/10.1061/(ASCE)WR.1943-5452.0000379http://-/?-http://dx.doi.org/10.1016/j.envsoft.2011.07.015http://dx.doi.org/10.1016/j.envsoft.2011.07.015http://-/?-http://refhub.elsevier.com/S0309-1708(14)00009-8/h0075http://refhub.elsevier.com/S0309-1708(14)00009-8/h0075http://-/?-http://dx.doi.org/10.1016/0005-1098(70)90095-6http://dx.doi.org/10.1016/0005-1098(70)90095-6http://-/?-http://dx.doi.org/10.1061/(ASCE)0733-9496(2004)130:2(93)http://dx.doi.org/10.1061/(ASCE)0733-9496(2004)130:2(93)http://-/?-http://dx.doi.org/10.1061/(ASCE)0733-9496(2007)133:3(192)http://dx.doi.org/10.1061/(ASCE)0733-9496(2007)133:3(192)http://-/?-http://dx.doi.org/10.1029/2009WR008730http://-/?-http://dx.doi.org/10.1016/j.advwatres.2005.08.006http://dx.doi.org/10.1016/j.advwatres.2005.08.006http://-/?-http://dx.doi.org/10.1029/WR007i002p00273http://dx.doi.org/10.1029/WR004i003p00471http://dx.doi.org/10.1016/j.jpdc.2004.10.004http://dx.doi.org/10.1016/j.jpdc.2004.10.004http://dx.doi.org/10.1061/(ASCE)0733-9496(2001)127:2(89)http://dx.doi.org/10.1061/(ASCE)0733-9496(2001)127:2(89)http://dx.doi.org/10.1007/BF00940421http://dx.doi.org/10.1007/BF00940421

a parallel dynamic programming algorithm for multi-reservoir system

Documents