iseborgrujic-machinelearninginproductionoptimizationundergeologicaluncertainty

Use of Machine Learning in Petroleum ProductionOptimization under Geological Uncertainty

Obiajulu J. IseborOgnjen Grujic

December 14, 2012

1 Abstract

Geological uncertainty is of significantconcern in petroleum reservoir modelingwith the goal of maximizing oil produc-tion. Stochastic simulation allows generat-ing multiple reservoir models that can beused to characterize this uncertainty. How-ever, the large computation time needed forflow simulation (e.g., for use in productionforecasting) impedes the evaluation of flowon all reservoir models. In addition, per-forming a formal optimization of the wellcontrols to maximize say NPV leads to hun-dreds or thousands of function evaluations,each of which requires tens to hundredsof reservoir simulations depending on thenumber of reservoir models available.In this work we apply machine learning

techniques to provide computational sav-ings on two fronts. We use kernel k-meansclustering to select a small representativeset of earth models that characterize thegeological uncertainty so as to reduce thenumber of simulations for each optimiza-tion function evaluation, and use a krigingsurrogate in the optimization to reduce therequired number of function evaluations.

2 Introduction

The task of optimizing the production froman oil field is a tough one fraught with the-oretical and computational challenges cen-tered around the uncertainties that are an

intrinsic part of the problem. Firstly, as thepetroleum reservoir from which we plan toproduce is below the surface of the earth, weonly have approximate models of what wethink the reservoir looks like. In attemptsto capture this geological uncertainty in thereservoir description, tens to hundreds ofprobable earth models with different reser-voir properties and property distributionsare typically generated using stochastic sim-ulation.

Generating these different realizationmodels is not a goal in and of itself but theuse of the models to understand the uncer-tainty in modeling some response functionis what we are after. The response functionis typically a measure of the performance ofthe field, e.g., cumulative oil production ornet present value (NPV), that is of inter-est to decision makers in charge of manag-ing the field’s production. In recent years,techniques from the field of numerical opti-mization have been applied to optimize ourmeasure of field performance so that we canplan to get the best out of our field movingforward in time, given the information weknow about the field right now.

Our project involved the investigation ofthe use of machine learning methods to clas-sify the tens to hundreds of earth mod-els, and select a small representative subset,that still represents the uncertainty range,to use for forecasting or optimization. Thisis done in order to not run numerical reser-voir simulations (which can be quite expen-

1

sive) on all the models but only a subsetof them, while still spanning the range ofuncertainty in the desired response (e.g.,NPV).

In addition to earth model selection, wewould also like to investigate the use of ma-chine learning methods as proxies for theobjective function evaluations during theoptimization process. For example, havinga supervised learning algorithm, like a krig-ing surrogate, trained to reproduce the re-sponse might provide significant savings inan optimization process that requires run-ning hundreds or thousands of simulationsby replacing a good number of these simula-tions (which could take minutes to hours torun each, depending on the size of the reser-voir model) with approximations from thekriging surrogate (which runs in a fractionof a second).

3 Problem statement

The problem we aim to solve is the opti-mization of well controls in a petroleum fieldproduced under waterflooding, where wateris pumped through injection wells in orderto maintain reservoir pressure and displacethe resident oil toward production wells.Simulating the flow through the petroleumreservoir, in order to ascertain injection andproduction volumes for NPV calculations,is done with the use of a reservoir simu-lator (in this work, we use the Stanford-developed reservoir simulator called GPRS[4]).

The well controls (part of the inputsto the reservoir simulator) being optimizedare the injection and production bottom-hole pressures (BHPs) and the objective isto maximize undiscounted NPV from thepetroleum field. Due to the uncertainty inthe reservoir description, the optimizationis performed in a robust manner where theobjective function is actually the expectedvalue of the projected production given aset of well controls, characterized as the av-

erage of NPV over several geological real-izations. Formally stated, the optimizationproblem we aim to solve is given as

maxx∈X

J (x) =1

Nr

Nr∑j=1

NPV (x,mj) , (1)

where x represents the vector of well controlvariables (well BHPs), X = {x ∈ Rn; xl ≤x ≤ xu} represents the box constraints forthe control variables, Nr is the number ofgeological realizations, mj represents thegeological model parameters for realizationj, and the undiscounted NPV from eachmodel is given by

NPV (x,mj) = poQo (x,mj)︸︷︷︸oil revenue

−

cwpQwp (x,mj)︸︷︷︸water production cost

− cwiQwi (x,mj)︸︷︷︸water injection cost

,(2)

where po, cwp and cwi are the price of oiland costs of produced and injected waterper barrel, respectively. Qo, Qwp and Qwi

are the cumulative oil and water produc-tion and water injection in barrels, respec-tively. These are the outputs from thereservoir simulator that are required forthe NPV computation. Figure 1 belowpresents the workflow for the average NPVcalculation for running example used in theproject. This example involves using 45 ge-

Figure 1: Illustration of computation ofNPVs for full set of realizations.

ological realizations (15 each produced fromstochastic simulations assuming 3 differentdepositional environments). The field haseight wells, four production (red circles infigure) and four injection (blue circles in

2

figure) wells. With these well locations to-gether with the initial specified well controlsof constant injection and production, the 45reservoir simulations are run, resulting in45 injection and production profiles (onlyoil production profiles shown in Figure 1)from which we can calculate 45 NPV val-ues, shown as an empirical cumulative den-sity function (CDF) in Figure 1. From these45 NPV values, we can calculate the aver-age NPV, <NPV>, which is the objectivefunction value for the initial set of controls.

4 Machine learning for

computational savings

From equations (1) and (2) we see that eachevaluation of the objective function in theoptimization requires running Nr reservoirsimulations (45 simulations in our exam-ple). In addition, the optimization processcan require hundreds to thousands of func-tion evaluations, depending on the complex-ity of the problem and the optimization al-gorithm used. It is no surprise that this op-timization process considering the differentgeological realization can be quite expen-sive. As a result, we investigate using ma-chine learning techniques to provide compu-tational savings on two major fronts: reduc-ing the number of realizations using clus-tering, while still effectively characterizingthe geological uncertainty in the problem,in order to reduce the simulations neededfor each objective function evaluation; andusing an optimization approach that uses akriging surrogate to reduce the number offunction evaluations that require reservoirsimulations.

4.1 Clustering and earthmodel selection

The idea of using k-means clustering forearth model selection to represent uncer-tainty in model responses using a subsetof the original models was introduced by

Scheidt and Caers in 2009 [6] (details of theapproach can be found in this paper). InFigure 2 we present an illustration of theprocess for our running example.

Figure 2: Earth model selection process.

The process illustrated in Figure 2 showsour clustering and model selection processand the different parts of the process are asfollows

1. Given the initial well controls, we gen-erate the model responses (cumulativeoil production profiles) for all 45 reser-voir models using the process illus-trated in Figure 1.

2. Find pairwise distances between thethe model responses.

3. Map with multi-dimensional scaling(MDS) into a 2D MDS space, as is donein [6].

4. Transform using a Gaussian kernel intoa low-dimensional feature space wherethe points for each model are more sep-arable.

5. Perform k-means clustering to clusterthe data into six clusters (different col-ors in Figure 2) and identify the modelsthat are the closest to the cluster cen-troids.

6. Transform back to the MDS space andidentify the selected earth models.

3

7. Identify six selected model responsesand show that they span the same un-certainty range as the initial 45 models.

8. Show that the selected models repro-duce the ensemble statistics (P10, P50and P90) as the initial set of models,again for the initial well controls.

4.2 Surrogate-based optimiza-tion

After performing the model selection ex-plained in the preceding subsection, we nowneed to run the optimization in order to de-termine the optimal well controls that max-imize <NPV>. We applied a surrogate-based search-poll optimization procedureintroduced by Booker et al. [3], with a modi-fied implementation from Abramson [1] andillustrated below in Figure 3.

Figure 3: Surrogate-based optimizationworkflow.

The poll step involves using the general-ized pattern search (GPS) algorithm of Au-det and Dennis Jr. [2], which uses a stencil-based approach to identify points in opti-mization parameter space to evaluate theobjective function. Before each poll step,there is a search step where a fast-runningsurrogate is optimized and the resulting wellcontrols applied to the true objective func-tion evaluation. We use a kriging surrogate(from the DACE kriging toolbox [5]), whichis essentially a response surface approximat-ing the true objective function surface. Theinitial surrogate is built from points from aLatin hypercube search (LHS) and updated,

every time the polling step is unsuccessful,using the new points that were evaluated inthe polling process.

5 Results

We ran the optimization codes on our wellcontrol problem for several cases, with theoriginal 45 models and with the selected 6models, as well as with and without the useof the kriging surrogate in the optimizationprocess. Our project results are summa-rized below in Figure 4. In Figure 4a we

0 200 400 600 800 1000 1200−20

−15

−10

−5

0

5

10

15

20

Number of function evaluations

Ave

rage

NP

V (

$ M

M)

45 realizations, without surrogate6 realizations, with surrogate

(a) Improvement due tosurrogate use

0 10 20 30 40 50−20

−15

−10

−5

0

5

10

15

20

Number of simulations (thousands)

Ave

rage

NP

V (

$ M

M)


(b) Improvement due tomodel number reduction

−150 −100 −50 0 500

0.2

0.4

0.6

0.8

1

NPV ($ MM)

F(N

PV

)


(c) Initial NPV CDFs

−5 0 5 10 15 20 250

0.2

0.4

0.6

0.8

1

NPV ($ MM)

F(N

PV

)


(d) Final NPV CDFs

Figure 4: Performance result comparisons.

show the improvement in computational ef-ficiency of optimization with and withoutsurrogates. We see that with the use of thekriging surrogate, there is a rapid increasein <NPV> after only a few true functionevaluations because we are doing a lot of theoptimization on the surrogates and runningthe results on the true function evaluation.To get an idea of the timing, one surrogatefunction evaluation is a small fraction of asecond, where each true function evaluationis about 30 seconds, for this simple case.The point of Figure 4a is that we get aboutthe same optimal solution but with muchless function evaluations.In Figure 4b we show the improvement

in computational efficiency due to reduc-

4

ing the number of reservoir models thatcharacterize the uncertainty in the prob-lem. For the case with Nr realizations, eachfunction evaluation corresponds toNr reser-voir simulations and thus reducing the num-ber of models needed to characterize uncer-tainty directly leads to a reduction in num-ber of simulations needed for the optimiza-tion. Combine this with the improvementin computational efficiency from includingsurrogate use in the optimization and weget the comparison shown in Figure 4b.

Figures 4c and 4d present the initialand final CDFs respectively for the caseswhere we optimize <NPV> from all ini-tial 45 models without using a surrogateand where we optimize <NPV> from onlythe six models with surrogate use in theoptimization. The figures show that thesix-model representation of uncertainty isjust as good as the one with 45 models.The optimization is successful in improv-ing the performance from the reservoir. Ini-tially, <NPV> was negative meaning thatthe project is unprofitable under the sce-nario where we implement the initial wellcontrols. After optimization, we improve<NPV> to about $15 million. Our opti-mized well controls are also more robust be-cause the range of NPVs in the final CDFis $20 million, compared with about $130million for the initial well controls.

6 Concluding remarks

In this work, we successfully implementedan approach for optimizing well controls inpetroleum production under geological un-certainty. This process can be computa-tionally expensive but we present machinelearning approaches to improve the compu-tational efficiency of the optimization ap-proach.

We showed that MDS and kernel k-meansclustering is suitable for selecting a repre-sentative subset of earth models. For ourwaterflooding well control optimization ex-

ample, we showed that surrogate-based op-timization, using a kriging surrogate, sig-nificantly reduces the optimization compu-tational cost. Our results show that theoptimized well control obtained with oursurrogate-based approach run on the se-lected subset of reservoir models is compa-rable in terms of quality of solution with thefull optimization with 45 models, while re-quiring significantly less computational ex-pense.

References

[1] M. A. Abramson. NOMADm version4.6 User’s Guide. Department of Math-ematics and Statistics, Air Force Insti-tute of Technology, 2007.

[2] C. Audet and J. E. Dennis Jr. Analysisof generalized pattern searches. SIAMJournal on Optimization, 13(3):889–903, 2002.

[3] A. J. Booker, J. E. Dennis Jr., P. D.Frank, D. B. Serafini, V. Torczon, andM. W. Trosset. A rigorous frameworkfor optimization of expensive functionsby surrogates. Structural Optimization,17:1–13, 1999.

[4] H. Cao. Development of Techniques forGeneral Purpose Simulators. PhD the-sis, Department of Petroleum Engineer-ing, Stanford University, 2002.

[5] S. N. Lophaven, H. B. Nielsen, andJ. Sondergaard. DACE: A Matlab krig-ing toolbox, version 2.0. Technical re-port, Technical University of Denmark,2002.

[6] C. Scheidt and J. Caers. Uncertaintyquantification in reservoir performanceusing distances and kernel methods:Application to a west africa deepwa-ter turbidite reservoir. SPE Journal,14(4):680–692, 2009.

5

iseborgrujic-machinelearninginproductionoptimizationundergeologicaluncertainty

Documents