spe 173298 a surrogate-based adaptive sampling approach

12
SPE 173298 A Surrogate-based Adaptive Sampling Approach for History Matching and Uncertainty Quantification Weixuan Li, SPE, Pacific Northwest National Laboratory; Dongxiao Zhang, SPE, Peking University; Guang Lin, Purdue University Copyright 2015, Society of Petroleum Engineers This paper was prepared for presentation at the SPE Reservoir Simulation Symposium held in Houston, Texas, USA, 23–25 February 2015. This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright. Abstract History matching is commonly performed in reservoir simulations to calibrate model parameters and to improve prediction accuracy. History matching problems often have non-unique solutions, i.e., there exist different combinations of parameter values that all yield the simulation results matching the measurements. In such a situation, finding a single solution matching the observations does not guarantee a correct prediction for future production. Alternatively, a more reliable prediction should be made with an uncertainty quantification based on all different possible scenarios of the model parameters. Bayesian theorem provides a theoretical foundation to represent different solutions and to quantify the uncertainty with the posterior probability density function (PDF). Lacking an analytical expression, the posterior PDF are often shown with a sample of realizations, each representing a possible scenario. This paper presents a novel sampling algorithm aiming to deal with two commonly encountered difficulties in the sampling process. First, a typical sampling method requires intensive model evaluations and hence may cause unaffordable computational burden. To alleviate this burden, our algorithm uses a Gaussian process (GP)-based surrogate as an approximation of the computationally expensive reservoir model to speed up the sampling process. The GP surrogate is adaptively refined locally such that the necessary approximation accuracy is achieved with a minimum level of computational cost. Secondly, when the dependent relationship between observation variables and input parameters is nonlinear, the posterior PDF could be in a complex form, such as multimodal, which is difficult to sample from. To tackle this difficulty, a Gaussian mixture model (GMM) is used as the proposal PDF to explore the parameter space. The GMM is flexible to approximate different distributions and is particularly efficient when the posterior is multimodal. The developed approach is tested with an illustrative history matching problem and shows its capability in handling the above- mentioned issues. Multimodal posterior of the testing problem is captured and are used to give a reliable production prediction with uncertainty quantification. The new algorithm reveals a great improvement in terms of computational efficiency comparing previously studied approaches for the sample problem. Introduction History matching is commonly performed in reservoir simulations to calibrate model parameters and to improve prediction accuracy. A major issue in many history matching problems is the existence of non-unique solutions, i.e., different combinations of model parameters (model input) that all yield a simulation result (model output) that matches observed production history. In such a situation, finding a single solution matching the observations does not guarantee a correct prediction for future production (Carter et al., 2006). Alternatively, a more reliable prediction should be made with an uncertainty quantification based on all different possible scenarios of the model parameters. A history matching problem with non-unique solutions may be solved within a probabilistic framework (Kaipio and Somersalo, 2006; Oliver et al., 2008; Li 2014). Following the Bayes’ rule, different possible solutions are described with a

Upload: others

Post on 15-Jun-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SPE 173298 A Surrogate-based Adaptive Sampling Approach

SPE 173298

A Surrogate-based Adaptive Sampling Approach for History Matching and Uncertainty Quantification Weixuan Li, SPE, Pacific Northwest National Laboratory; Dongxiao Zhang, SPE, Peking University; Guang Lin, Purdue University

Copyright 2015, Society of Petroleum Engineers This paper was prepared for presentation at the SPE Reservoir Simulation Symposium held in Houston, Texas, USA, 23–25 February 2015. This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.

Abstract History matching is commonly performed in reservoir simulations to calibrate model parameters and to improve prediction accuracy. History matching problems often have non-unique solutions, i.e., there exist different combinations of parameter values that all yield the simulation results matching the measurements. In such a situation, finding a single solution matching the observations does not guarantee a correct prediction for future production. Alternatively, a more reliable prediction should be made with an uncertainty quantification based on all different possible scenarios of the model parameters. Bayesian theorem provides a theoretical foundation to represent different solutions and to quantify the uncertainty with the posterior probability density function (PDF). Lacking an analytical expression, the posterior PDF are often shown with a sample of realizations, each representing a possible scenario. This paper presents a novel sampling algorithm aiming to deal with two commonly encountered difficulties in the sampling process. First, a typical sampling method requires intensive model evaluations and hence may cause unaffordable computational burden. To alleviate this burden, our algorithm uses a Gaussian process (GP)-based surrogate as an approximation of the computationally expensive reservoir model to speed up the sampling process. The GP surrogate is adaptively refined locally such that the necessary approximation accuracy is achieved with a minimum level of computational cost. Secondly, when the dependent relationship between observation variables and input parameters is nonlinear, the posterior PDF could be in a complex form, such as multimodal, which is difficult to sample from. To tackle this difficulty, a Gaussian mixture model (GMM) is used as the proposal PDF to explore the parameter space. The GMM is flexible to approximate different distributions and is particularly efficient when the posterior is multimodal. The developed approach is tested with an illustrative history matching problem and shows its capability in handling the above-mentioned issues. Multimodal posterior of the testing problem is captured and are used to give a reliable production prediction with uncertainty quantification. The new algorithm reveals a great improvement in terms of computational efficiency comparing previously studied approaches for the sample problem.

Introduction History matching is commonly performed in reservoir simulations to calibrate model parameters and to improve prediction accuracy. A major issue in many history matching problems is the existence of non-unique solutions, i.e., different combinations of model parameters (model input) that all yield a simulation result (model output) that matches observed production history. In such a situation, finding a single solution matching the observations does not guarantee a correct prediction for future production (Carter et al., 2006). Alternatively, a more reliable prediction should be made with an uncertainty quantification based on all different possible scenarios of the model parameters. A history matching problem with non-unique solutions may be solved within a probabilistic framework (Kaipio and Somersalo, 2006; Oliver et al., 2008; Li 2014). Following the Bayes’ rule, different possible solutions are described with a

Page 2: SPE 173298 A Surrogate-based Adaptive Sampling Approach

2 SPE 173298

probability density function (PDF), known as the posterior PDF given the observed production history. Usually, the analytical expression of the posterior PDF is not tractable. Instead, the posterior PDF may be represented with a sample of realizations, each showing a possible scenario, generated using a sampling approach. Sampling from the posterior PDF could be challenging. A typical sampling approach requires a thorough exploration of the parameter space, which involves repetitive trial model simulations at different proposed parameter points. Usually, only a tiny portion of the proposed points would hit the target posterior PDF. As a result, a sample that well represents the posterior PDF can only be achieved at the cost of an extremely large number of model simulations. Moreover, for those history matching problems in which the input-output relationship is strongly nonlinear, the posterior PDF could be in a complex shape, such as multimodal, which is particularly difficult to capture. In this paper, we present a new sampling algorithm aiming to relieve the computational burden of model simulations in the sampling process. Two essential techniques are implemented in this algorithm. First, we use a Gaussian mixture model (GMM) as the proposal distribution to explore the parameter space. The GMM distribution is easy to sample from and also flexible enough to approximate PDFs with complex shapes. If we can find a GMM proposal that closely covers the posterior PDF, the proposed points would have a high probability of hitting the target, and hence the posterior PDF can be captured with relatively small number of proposals. The second key technique in the algorithm is a Gaussian process (GP)-based surrogate model. Whenever a model evaluation is required in the sampling process, the GP surrogate may be used as a computationally inexpensive alternative to the more costly reservoir model. Note that building a surrogate that accurately approximates the original model may be a challenging task itself (Zubarev, 2009), and inaccurate approximation causes an extra error in the estimation of the posterior PDF. Fortunately, for a Bayesian inversion problem, the surrogate does not have to be globally accurate. In fact, we mainly care about the surrogate accuracy at the regions close to the posterior PDF, because this is where we would like to propose the sample points and run simulations. Two advantages of using a GP surrogate are 1) it provides a convenient estimation of the approximation error, and 2) it can be refined locally where necessary. These features allows for an adaptive construction of the GP surrogate for the specific inverse problem being studied. In practice, finding a GMM proposal and a GP surrogate that are suitable for the posterior PDF in a specific problem is not trivial, because the posterior PDF itself is unknown. In our algorithm this difficulty is addressed through a series of iterative surrogate refinement and re-sampling loops, which are shown in Fig. 1. Initially, having no information about where the target posterior is, we start with a GMM proposal and a GP surrogate that are built according to the prior PDF (i.e., the PDF represents all possible parameter values before history matching). The sample obtained with such proposal and surrogate may not accurately reveal the target posterior PDF. But this sample outlines the regions where the posterior PDF potentially distributes, and hence could be used to decide where to refine the surrogate and to build a GMM proposal closer to the target. With the refined surrogate and a new GMM proposal, the algorithm proceeds to generate a new sample. This iteration is implemented until the surrogate is accurate enough at all sample points. The rest of the paper is organized as follow. First, we state the problem of history matching and uncertainty quantification within a Bayesian inversion framework. Next, we present the main adaptive sampling algorithm, which is then illustrated with an example. Finally, the paper is summarized with concluding discussions. Bayesian framework for history matching and uncertainty quantification Bayesian inversion. Without loss of generality, a reservoir model can be expressed as 𝐝 = 𝑔(𝐦), where 𝐦 ∈ ℝ!! is the input vector containing 𝑛! uncertain model parameters that need to be estimated. d is the output vector that contains the simulated production data which are to be checked against observed production history. 𝑔(∙) is the mapping from input to output and can be evaluated at a specific parameter point m by running a simulation of the reservoir model. In a history matching problem, we estimate the value of model input m from the observation of model output d. This forms an inverse problem that can be formulated with Bayes’ rule: 𝑝 𝐦|𝐝∗ ∝ 𝑝(𝐦)  𝑝(𝐝∗|𝐦),..……...…………….…………….………………………………………………………………(1) where 𝑝 𝐦|𝐝∗  is the posterior PDF that describes all possible combinations of parameter values that are consistent with the observation of the output d∗. Bayes’ rule states that the posterior PDF is proportional to the product of two terms: the prior PDF 𝑝(𝐦), which represents our knowledge about the model parameter before observing any production data, and the likelihood 𝑝(𝐝∗|𝐦), which reveals the new information gained from the observation 𝐝∗. In an inverse problem, the prior PDF 𝑝(𝐦) is given as known, whereas the likelihood needs to be calculated based on the relationship between m and 𝐝∗. In our paper, we consider the following two situations when calculating the likelihood.

Page 3: SPE 173298 A Surrogate-based Adaptive Sampling Approach

SPE 173298 3

First, consider the situation when the model 𝑔 ∙ is accurate. The observed output value may differ from the model prediction (with correct input value) due to an observation error: 𝐝∗ = 𝑔 𝐦 + 𝐞!. Assuming the observation error follows a PDF 𝐞!~𝑓!!(∙), we have the likelihood in the following form: 𝑝 𝐝∗|𝐦 = 𝑓!!(𝐝

∗ − 𝑔 𝐦 )…………………………………………………………………………………………………...(2) More generally, consider the situation when the model does not perfectly represent the input-output dependent relationship. For instance, suppose we use a surrogate model 𝑔(𝐦)  to replace the original accurate model 𝑔 𝐦 in the sampling process for saving computational cost. However, if one simply replaces 𝑔 𝐦 with 𝑔(𝐦) in Eq. (2), the estimation of the parameters may be biased due to in accurate surrogate approximation. For example, a proposed parameter point that matches 𝐝∗ through 𝑔 𝐦 but not through 𝑔(𝐦) may be mistakenly excluded from the sample. This implies that Eq. (2) needs to be modified to reflect the effect of surrogate error. Assume 𝑔(𝐦) differs from 𝑔 𝐦 by an approximation error 𝐞!, which also contributes to the discrepancy between the observed and the simulated results: 𝐝∗ = 𝑔 𝐦 + 𝐞! + 𝐞!. Let 𝑓!(∙) denote the PDF of the total error 𝐞 = 𝐞! + 𝐞!, the likelihood calculated with 𝑔 𝐦 is then expressed as 𝑝 𝐝∗|𝐦 = 𝑓!(𝐝∗ − 𝑔 𝐦 )……………………………………………………………………………………………………(3) Note that, usually, 𝑔 𝐦 or 𝑔 𝐦 has no explicit analytical expression. As a result, the posterior PDF cannot be analytically derived. Instead, we may represent the posterior PDF with a sample of realizations generated with a sampling method such as importance sampling. Importance sampling. Using importance sampling approach, a sample of points from the posterior PDF 𝑝 𝐦|𝐝∗ is generated in two steps: proposal and correction. In the first step, a sample of points {𝐦!

! ,… ,𝐦!(!!)} is drawn from a

proposal PDF 𝑞 𝐦 , which may be different from the target 𝑝 𝐦|𝐝∗ but takes some simple form that is easy to sample from. In the second step, each proposed point is associated with a weight so the sample correctly represents the target PDF 𝑝 𝐦|𝐝∗ ,

𝑤! ∝! 𝐦(!)|𝐝∗

!(𝐦!! )

∝! 𝐝∗|𝐦!

! !(𝐦!! )

!(𝐦!! )

, 𝑗 = 1,… , 𝑛!……………………………………………………………………………….(4)

The weights are normalized such that they sum to 1. Furthermore, we may obtain an equally weighted sample 𝐦(!),… ,𝐦(!!) by sampling among the proposed points such that probability of the ith point 𝐦(!) being chosen as 𝐦!

! is equal to 𝑤! , for 𝑖, 𝑗 = 1,… , 𝑛! . With this sample, the posterior PDF may be approximated as 𝑝 𝐦|𝐝∗ ≈ 𝛿(𝐦 −!!

!!!

𝐦 ! )/𝑛!, where 𝛿(∙) is the Dirac function. From Eq. (4), we see that the computation of the weights involves repetitive evaluations of the likelihood function 𝑝 𝐝∗|𝐦 , and hence the model 𝑔(𝐦), at the proposed points. This procedure could easily exhaust the available computational resource. Another difficulty is the selection of the proposal distribution 𝑞 𝐦 . Ideally, the proposal should be as close to the target posterior PDF as possible. However, for a history matching problem (inverse problem in general), the target itself is unknown. When the proposal PDF is significantly different from the posterior PDF, most of the proposed points would not hit the target and then end up with nearly zero weights, which results in a poor representation of the posterior PDF. To address these challenges, we present a new adaptive sampling approach in the following section. A surrogate-based adaptive sampling approach Basic idea. As illustrated in Fig., 1, the sampling algorithm consists of a series of iterative importance sampling loops. To speed up the sampling process in each loop, a GP surrogate is used to replace the computationally demanding original model and. In other words, we sample from the posterior PDF defined via the modified likelihood (Eq. (3)) instead of the true likelihood (Eq. (2)). This modified posterior PDF may deviate the true posterior due to the surrogate error. So, after a sample is obtained, we examine the surrogate error at the sampled points and decide whether the surrogate model is accurate enough. If not, the algorithm implements a surrogate refinement and proceeds to the next sampling loop. As the surrogate diminishes in this iteration, we finally achieve the sample of the true posterior PDF. The second technique to improve sampling efficiency is to select a proposal PDF that is close to the target posterior. Note that the target posterior PDFs in two consecutive loops are similar to each other (their difference is caused only by the

Page 4: SPE 173298 A Surrogate-based Adaptive Sampling Approach

4 SPE 173298

surrogate refinement), so the sample points obtained in previous loop provide a hint of where to propose new sample points in current loop. Specifically, our algorithm fits a GMM from the sample points obtained in previous loop and use it as the proposal in current loop. The two key techniques, GP-based surrogate and GMM proposal, are explained in the following subsections. Gaussian process based surrogate model. The methods of using GPs to approximate deterministic computer models were developed in many different studies (Sacks et al., 1989; Currin et al., 1991; Kennedy and O’Hagan, 2001). A GP surrogate model may be understood via an analogy to Kriging interpolation used in geostatistics. Kriging interpolation estimates a geological property, e.g., permeability 𝑘(𝐱), which varies with the spatial coordinates in the 3-dimensional physical space. Similarly, a GP surrogate approximates the output value of a computer model  𝑔(𝐦) that varies with the input parameter 𝐦 defined in the 𝑛!-dimensional parameter space. Here we briefly review the basic idea of GP surrogate. Without knowing the exact model response of 𝑔(𝐦), we assume it resembles a realization of a Gaussian stochastic process 𝐺 𝐦 ~𝒩(𝜇 ∙ ,𝐶(∙,∙)), which is characterized by its mean function 𝜇 ∙ and covariance function 𝐶(∙,∙). The mean and covariance functions are estimated, e.g., using the maximum likelihood criterion (Jones et al., 1998), from the model evaluations of 𝑔(𝐦) at a set of parameter points, called “base points”: 𝐵 = {𝐦!

! ,… ,𝐦!(!!)}. To make 𝐺 𝐦 a better approximation of 𝑔(𝐦), we make it interpolate the true model response at the

base points via statistical conditioning. The conditional process 𝐺|! 𝐦  is still Gaussian, and its mean and variance at a specific parameter point m are given by 𝜇|! 𝐦 = 𝜇 𝐦 + 𝐶𝐦!𝐶!!!!(𝑔 𝐵 − 𝜇(𝐵)),……………………………….………………………………………………….(5) and 𝜎|!! 𝐦 = 𝐶 𝐦,𝐦 − 𝐶!𝐦! 𝐶!!!!𝐶!𝐦,…………………………………………………………………………………………...(6) respectively. Here 𝐶𝐦! is a 1 by 𝑛! vector where the ith element is 𝐶(𝐦,𝐦!

! ). 𝐶!! is a 𝑛! by 𝑛! matrix where the element on the ith row and the jth column is  𝐶(𝐦!

! ,𝐦!! ).  𝑔 𝐵 and 𝜇(𝐵) are two 𝑛! by 1 vectors that contain the model response of

𝑔 ∙ and the unconditional mean 𝜇 ∙ , evaluated at the 𝑛! base points, respectively. The conditional process may be used as a surrogate model that helps to calculate an approximate likelihood (Eq. 3) for Bayesian inversion. Specifically, the conditional mean (5) serves as an approximation of the true model response, 𝑔 𝐦 =𝜇|! 𝐦 , whereas the conditional variance (6) provides an estimation of the approximation error, 𝐞!(𝐦)~𝒩(0,𝜎|!! 𝐦 ). The approximation error can be recused simply be adding more base points, although a significant amounts of computational cost may be needed for the model evaluations at the base points. Fortunately, for a Bayesian inversion problem, the surrogate does not have to be globally accurate. We mainly care about the surrogate accuracy at the regions close to the posterior PDF, because this is where we would like to generate the sample points. To achieve the necessary accuracy with a minimum level of computational cost, the algorithm adds the base points in an adaptive manner. The initial surrogate may be built with a relatively small number of base points. In each of the following loops, a new base point is selected from one of the sample points obtained in previous loop where the surrogate error exceeds a threshold. Gaussian mixture proposal. Before each re-sampling step, the algorithm seeks a proposal distribution. To obtain a proposal distribution that is close to the target, we use a GMM distribution fitted from the sample points obtained from the previous sampling loop. A GMM distribution is defined as the superposition of 𝐾 different multivariate Gaussian PDFs: 𝑞 𝐦 = 𝜋!!

!!! 𝒩(𝐦|𝛍! ,𝚺!)………………………………………………………………………………………………..(7) where 𝛍! and  𝚺! are the mean and covariance of the kth Gaussian component, respectively. 𝜋!, which sum to 1, are the mixing probabilities of each component. A sample point from the GMM distribution (7) may be generated in two steps. First, randomly select a Gaussian component according to probabilities 𝜋!, and then sample a point from the selected Gaussian component  𝒩(𝐦|𝛍! ,𝚺!). To find a GMM distribution that closely fits the sample points obtained in previous loop, we adopt the idea of hierarchical clustering (Duda et al., 2001; Hamerly and Elkan, 2003). The advantage of hierarchical clustering is that it automatically

Page 5: SPE 173298 A Surrogate-based Adaptive Sampling Approach

SPE 173298 5

determines K, the number of Gaussian components that need to be included in the GMM distribution. Starting with 𝐾 = 1, the sample points are first partitioned into two clusters with k-means clustering method. We then implement a test to determine if the two clusters are well separated (Duda et al., 2001). If so, we accept the partition and make 𝐾 = 2. Next, the same “partition-test” procedure is repeated for each cluster until no more partitioning is needed. After dividing the sample into K clusters, we fit K Gaussian distributions for each of the clusters. 𝛍! and 𝚺! are chosen to be the sample mean and sample covariance of the kth cluster, respectively. Finally, the mixing probabilities are determined by 𝜋! = 𝑛!/𝑛!, where 𝑛! is the number of points contained in the kth cluster. Illustrative example We test our algorithm on the history matching problem known as I-C Fault model (Tavassoli et al., 2004). Because of a strongly nonlinear input-output relationship, this problem has a complicated multimodal posterior PDF that is difficult to capture (Carter et al., 2006). In previous studies (Christie et al., 2006; Mohamed et al., 2012), Machine learning algorithms including Genetic Algorithms, Artificial Neural Network and population MCMC were used to assist the exploration of the parameter space, however, thousands of model simulations were still needed to achieve a satisfying sample from the posterior PDF. As shown in Fig. 2, the I-C Fault model represents the 2-D vertical cross section of an oil reservoir that is 1000 ft wide and 60 ft thick, and consists of six layers of alternating high- and low-permeable sand. A vertical fault in the middle divides the reservoir into two parts. Oil is recovered from the reservoir by water flooding. An injection well injects water into the reservoir from the left boundary, whereas a production well produces fluids at the right boundary. Both wells are completed on all layers and are operated at constant pressures. The upper and lower boundaries are no-flow boundaries. The model has three input parameters, which are associated with independent and uniform prior uncertainties: the fault throw ℎ ∈ [0,60] ft, the permeability of the good-quality sand 𝑘! ∈ [100,200] md, and the permeability of the poor-quality sand 𝑘! ∈ [0,50] md. For each parameter point sampled from the 3-dimensional parameter space, the corresponding production history, i.e., the time series of oil production rate and water cut, can be simulated from the model. Fig. 3 shows the simulated production histories based on a sample of input parameter points drawn from the prior PDF. It is seen from the figure that the variations of the input parameters result in a large uncertainty in the simulation results. To reduce this uncertainty, we calibrate the input parameters to the observations of the “synthetic true” production history, which is generated from the simulation based on a reference parameter point (ℎ∗ = 10.4 ft, 𝑘!∗ = 131.6 md, 𝑘!∗ = 1.3 md). Specifically, the observations data used are oil production rate and water cut recorded every month in the first three years of production. We solve this history matching problem within the Bayesian inversion framework using the adaptive importance sampling algorithm presented in this paper. The algorithm starts by building an initial GP surrogate with 40 initial base points selected from the prior distribution using Latin hyper-cube sampling. To illustrate the surrogate model, we evaluate the surrogate output (production history) at 3 randomly selected parameter points (Fig. 4). For each parameter point, the GP surrogate reports not only an approximate output, but also an estimation of the approximation error. We see that the confidence intervals reported by the surrogate enclose the true model responses at all 3 points. The initial sample obtained with this surrogate is shown in Fig. 5. This sample is not an accurate representation of the true posterior PDF but serves a starting point for the following re-sampling loops. A final sample is obtained after 45 refinement and re-sampling loops (Fig. 6). We see the reference parameter point is captured by the sample. This sample distributes within a much smaller region comparing with the initial sample (Fig. 5), however, the parametric uncertainty is not completely eliminated. Especially, the possible value of the fault throw could be as low as 2 ft and as high as 45 ft. In the last loop, the algorithm fits the sample points into a GMM distribution with 3 Gaussian components, which reveals the multimodal shape of the posterior PDF. Fig. 7 shows the simulation results based on the parameter points from the final sample. It is seen that all the simulations and the observations are in good agreement. The objective of history matching is to predict future production using the calibrated model. With the multiple solutions obtained with our algorithm, we are able to make predictions with uncertainty quantification. One of the most important quantities of interest in reservoir simulation is the cumulative oil production. We predict this quantity for the next 7 years after the observed history. Two sets of simulations are run with the sample points before and after the history matching, respectively (Fig. 8). From the figure, we see that matching the production history significantly improves the accuracy of prediction. Furthermore, the calibrated simulations are reliable in the near future, with little prediction uncertainty. Nevertheless, as the simulation time moves forward, the uncertainty in prediction gradually increases, showing that a single simulation matching the first three years of production history does not necessarily provide a reliable prediction over a long-time period.

Page 6: SPE 173298 A Surrogate-based Adaptive Sampling Approach

6 SPE 173298

Conclusions and discussions In this paper, we present an efficient adaptive sampling algorithm for history matching and uncertainty quantification problems. The main contribution of this algorithm is its capability of sampling from complex, multimodal posterior distributions while keeping the number of expensive model evaluations as small as possible. This is achieved by the implementation of two key techniques. 1) A GP-based surrogate to the original model is constructed through adaptive local refinements to facilitate the sampling from the posterior. 2) A GMM distribution that approximates the target posterior is used as the proposal distribution for better sampling efficiency. The algorithm is tested with a nonlinear history matching problem, the I-C Fault model. In the testing example, the algorithm successfully captures the complex multimodal posterior PDF of model parameters and is able to predict the future production with uncertainty quantification. In this example, the computational cost of the sampling algorithm is mainly spent on the model simulations at the base points for GP surrogate construction. A total number of 85 model simulations are required for the initial surrogate construction and the following refinements. This number shows a significant efficiency improvement comparing with previously studied approaches. Finally, we would like to point out that no approach is universally optimal for all history matching and uncertainty quantification problems. The approach used for a specific problem should always be selected by considering its strengths and limitations. The main strength of the algorithm presented in this paper is in handling nonlinear models and complex non-Gaussian posteriors, however, the method also has a weakness in handling models with a large number of uncertain parameters. This is mainly because the construction of the GP surrogate in high-dimensional parameter space becomes increasingly difficult. Other approaches such as ensemble Kalman filter could be better alternatives for high-dimensional problems, though they are usually limited to approximately Gaussian posteriors. Reference: Carter, J., P. Ballester, Z. Tavassoli, and P. King (2006), Our calibrated model has poor predictive value: An example from

the petroleum industry, Reliability Engineering & System Safety, 91(10–11), 1373–1381. Christie, M., V. Demyanov, and D. Erbas (2006), Uncertainty quantification for porous media flows, Journal of

Computational Physics, 217(1), 143–158. Currin, C., T. Mitchell, M. Morris, and D. Ylvisaker (1991), Bayesian prediction of deterministic functions, with applications

to the design and analysis of computer experiments, Journal of the American Statistical Association, 86(416), 953–963. Duda, R. O., P. E. Hart, and D. G. Stork (2001), Pattern Classification, John Wiley & Sons. Hamerly G., C. Elkan (2003), Learning the k in k-means, in: NIPS, Vol. 3, pp. 281–288. Jones, D. R., M. Schonlau, and W. J. Welch (1998), Efficient global optimization of expensive black-box functions, Journal

of Global Optimization, 13(4), 455–492. Kaipio, J., and E. Somersalo (2006), Statistical and Computational Inverse Problems, Springer. Kennedy, M. C., and A. O’Hagan (2001), Bayesian calibration of computer models, Journal of the Royal Statistical Society:

Series B (Statistical Methodology), 63(3), 425–464. Li, W. (2014), Inverse modeling and uncertainty quantification of nonlinear flow in porous media models, Ph.D. thesis,

University of Southern Californias Mohamed, L., B. Calderhead, M. Filippone, M. Christie, and M. Girolami (2012), Population MCMC meth- ods for history

matching and uncertainty quantification, Computational Geosciences, 16(2), 423–436. Oliver, D. S., A. C. Reynolds, and N. Liu (2008), Inverse Theory for Petroleum Reservoir Characterization and History

Matching, Cambridge University Press.

Page 7: SPE 173298 A Surrogate-based Adaptive Sampling Approach

SPE 173298 7

Sacks, J., W. J. Welch, T. J. Mitchell, and H. P. Wynn (1989), Design and analysis of computer experiments, Statistical Science, 4(4), 409–423.

Tavassoli, Z., J. Carter, and P. King (2004), Errors in history matching, SPE Journal, 9(3). Zubarev, D. (2009), Pros and cons of applying proxy-models as a substitute for full reservoir simulations, in SPE Annual

Technical Conference and Exhibition.

Figure 1: Workflow of surrogate-based adaptive sampling algorithm.

Figure 2: I-C Fault model (Carter et al., 2006), distances are measured in ft.

Initial GP Surrogate

Sample from posterior with GM proposal and GP surrogate

Stop

Refine surrogate at inaccurate sample points

Surrogate accurate at

sample points?

Yes

No

Initial GM proposal

Fit a new GM proposal from the sample points

Page 8: SPE 173298 A Surrogate-based Adaptive Sampling Approach

8 SPE 173298

Figure 3: Simulation results of the I-C Fault model based on the parameter points sampled from prior distribution.

600 400

time (days)

Oil P

rodu

ctio

n Ra

te (

STB/

day)

time (days)

Wat

er C

ut

0 200 400 600 800 1000 1200 0 200

800 1000

0 200 400 600 800 1000 1200 0 0.2 0.4 0.6 0.8

Simulations Observations!

Page 9: SPE 173298 A Surrogate-based Adaptive Sampling Approach

SPE 173298 9

Figure 4: Compare the GP surrogate response with the original model response at 3 randomly selected parameter points. The surrogate response is shown with estimated approximation error.

Surrogate response (mean +/-2 standard deviation) True model response!

0 200 400 600 800 1000 1200 0 200 400 600 800

1000

0 200 400 600 800 1000 1200 0

0.2 0.4 0.6 0.8

Oil P

rodu

ctio

n Ra

te (

STB/

day)

Wat

er C

ut

time (days) time (days)

Model response at parameter point 1!

0 200 400 600 800 1000 1200 0 200 400 600 800

1000

0 200 400 600 800 1000 1200 0

0.2 0.4 0.6 0.8

Oil P

rodu

ctio

n Ra

te (

STB/

day)

Wat

er C

ut

time (days) time (days)

Model response at parameter point 2!

0 200 400 600 800 1000 1200 0 200 400 600 800

1000

0 200 400 600 800 1000 1200 0

0.2 0.4 0.6 0.8

Oil P

rodu

ctio

n Ra

te (

STB/

day)

Wat

er C

ut

time (days) time (days)

Model response at parameter point 3!

Page 10: SPE 173298 A Surrogate-based Adaptive Sampling Approach

10 SPE 173298

Figure 5: Sample drawn from the posterior distribution of the I-C Fault model parameters via the GP-based surrogate built

on 40 initial base points. The sample points are shown in 3-D and 2-D projection views.

0 20 40 60

100 150

200 0 10 20 30 40 50

h(ft) k g (md)

k p (md)

reference&

100 120 140 160 180 200 0

10

20

30

40

50

k g (md)

k p (md)

0 10 20 30 40 50 60 0

10

20

30

40

50

h(ft)

k p (md)

0 10 20 30 40 50 60 100

120

140

160

180

200

h(ft)

k g (md)

Page 11: SPE 173298 A Surrogate-based Adaptive Sampling Approach

SPE 173298 11

Figure 6: Sample drawn from the posterior distribution of the I-C Fault model parameters via the GP-based surrogate built

on 40 initial and 45 refinement base points. The sample points are shown in 3-D and 2-D projection views.

Figure 7: Simulation results of the I-C Fault model based on the parameter points sampled from posterior distributions

obtained using the final GP-based surrogate.

100 120 140 160 180 200 0

10

20

30

40

50

k g (md)

k p (md)

0 10 20 30 40 50 60 0

10

20

30

40

50

h(ft)

k p (md)

0 10 20 30 40 50 60 100

120

140

160

180

200

h(ft)

k g (md)

0 20 40 60

100 150

200 0 10 20 30 40 50

h(ft) k g (md)

k p (md)

reference&

0 200 400 600 800 1000 1200 0 200 400 600 800

1000

0 200 400 600 800 1000 1200 0 200 400 600 800

1000

time (days)

Oil P

rodu

ctio

n Ra

te (

STB/

day)

time (days)

Wat

er C

ut

Simulations Observations!

Page 12: SPE 173298 A Surrogate-based Adaptive Sampling Approach

12 SPE 173298

Figure 8: predictions of future production based on samples from prior and posterior PDFs, respectively.

40 60 80 100 120 5

6

7

8

9

10 x 10 5

time (month)

cum

ulat

ive

oil p

rodu

ctio

n (S

TB)

40 60 80 100 120 5

6

7

8

9

10 x 10 5

time (month)

cum

ulat

ive

oil p

rodu

ctio

n (S

TB)

Simulations based on sample points Simulations based on reference point!

Prediction with prior sample points Prediction with posterior sample points !