an ensemble based nonlinear orthogonal matching pursuit ... · however, the amount of available...

AbstractA nonlinear orthogonal matching pursuit (NOMP) for sparse calibration of reservoir models is presented. Sparse calibration is a

challenging problem as the unknowns are both the non-zero components of the solution and their associated weights. NOMP is a greedyalgorithm that discovers at each iteration the most correlated components of the basis functions with the residual. The discovered basis(aka support) is augmented across the nonlinear iterations. Once the basis functions are selected from the dictionary, the solution isobtained by applying Tikhonov regularization. The proposed algorithm relies on approximate gradient estimation using an iterativestochastic ensemble method (ISEM). ISEM utilizes an ensemble of directional derivatives to efficiently approximate gradients. In thecurrent study, the search space is parameterized using an overcomplete dictionary of basis functions built using the K-SVD algorithm.

IntroductionSubsurface flow models relies on many parameters that cannot be measured directly. Instead, a sparse set of measurements may

exist at the location of wells. The complete distributions of these unknown fields are commonly inferred by a model calibration processthat takes into account historical records of the input-output of the model. However, the amount of available data to constrain the modelsis usually limited in both quantity and quality. This results in an ill-posed inverse problem that might admit many different solutions.Different parameter estimation techniques can be applied to tackle this problem. These techniques can be classified into Bayesianmethods based on Markov Chain Monte Carlo (MCMC) methods (Oliver et al., 1997; Ma et al., 2008; Fu and Gomez-Hernandez, 2008,2009; ELsheikh et al., 2012), gradient based optimization methods (McLaughlin and Townley, 1996; Carrera et al., 2005) and ensembleKalman filter methods (Moradkhani et al., 2005; Nævdal et al., 2005; Chen and Zhang, 2006; Elsheikh et al., 2012).

An important step in the automatic calibration process is to define a proper parameterization of these unknown fields. Most ofthe parameterization methods depend on a prior model assumptions that implicitly defines the spatial correlations of the unknownfileds using a parameter covariance matrix. Karhunen-Loeve expansion (KLE) (Kac and Siegert, 1947; Loeve, 1948) can be used forparameterizing spatially distributed fields. KLE, also known as proper orthogonal decomposition (POD) or principal component analysis(PCA) in the finite dimensional case (Berkooz et al., 1993), is widely used for parameterizing the permeability field in subsurface flowmodels (Reynolds et al., 1996; Efendiev et al., 2005; Li and Cirpka, 2006). KLE is an effective method that is simple to implement,however it only preserves the second order moments of the distribution. For complex continuous geological structures like channelizeddomain, KLE fails to preserves higher order moments.

Sparse calibration and compressed sensing (CS) (Donoho, 2006; Candes and Wakin, 2008) are very active research areas in thesignal processing community. Standard reconstruction methods relies on defining a set of basis functions which are orthogonal as inKLE methods and then one tries to find the optimal set of weights to reconstruct the measurements. This reconstruction problem is anill-posed problem and regularization techniques (i.e. Tikhonov regularization) that constrain the `2-norm of the solution are commonlyapplied. The quality of the solution depends on the class of basis functions that are used to parameterize the search space. In sparsecalibration methods, a large collection of basis functions are included in a dictionary and the solution process consists of picking the bestbasis functions for accurate reconstruction of the unknown field as well as finding the associated weights.

In the current paper we follow Khaninezhad et al. (2012a,b) in utilizing special parameterization based on sparse dictionary learning.Given a set of realizations of the unknown field (i.e. permeability field), the dictionary learning problem is formulated as an optimizationproblem to find the best basis functions such that each realization can be represented as a linear combination of only few basis functions.These dictionaries are over-complete and have a certain amount of redundancy. This redundancy is desirable as it gives robustnessto the representation. Building the optimal dictionary that approximates a signal with a minimum error is NP-hard (non-deterministic

SPE 163582

An Ensemble Based Nonlinear Orthogonal Matching Pursuit Algorithm for Sparse History Matching of Reservoir Models Ahmed H. Elsheikh, SPE, University of Texas at Austin, King Abdullah University of Science and Technology (KAUST), Saudi Arabia; Mary F. Wheeler, SPE, University of Texas at Austin and Ibrahim Hoteit, King Abdullah University of Science and Technology (KAUST), Saudi Arabia Copyright 2013, Society of Petroleum Engineers This paper was prepared for presentation at the SPE Reservoir Simulation Symposium held in The Woodlands, Texas USA, 18–20 February 2013. This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Pet roleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.

polynomial-time hard) problem and approximate algorithms can be used. Here, we utilized the K-SVD algorithm introduced by Aharonet al. (2006) and used by Khaninezhad et al. (2012a) for parameterizing the unknown subsurface fields. We refere the reader to the goodintroduction to the topic presented by Khaninezhad et al. (2012a) as it is straight forward application of well developed image processingtechniques (Elad, 2010).

Once the dictionary is defined, the sparse calibration can proceed in two different directions. The first direction is to solve anoptimization problem that penalizes the solution in the `1-norm and minimizes the reconstruction error. The second class of algorithmsare greedy algorithms, that iteratively find and remove elements from the dictionary that are maximally correlated with the residuals.Khaninezhad et al. (2012a,b) followed the first direction by utilizing an iteratively reweighted least-squares (IRLS) (Chartrand and Yin,2008) algorithm to identify the important dictionary elements (solution support) and its associated weights by minimizing a sparsity-regularized objective function. Khaninezhad et al. (2012a,b) utilized an adjoint code to estimate the sensitivities for solving the nonlinearparameter estimation problem.

The current paper builds on the pioneering work of Khaninezhad et al. (2012a,b). However, we develop an ensemble based methodfor solving the sparse calibration problem given a dictionary built using the K-SVD algorithm. Ensemble based methods have provento be an effective tool for subsurface model calibration (Evensen, 1994; Moradkhani et al., 2005; Nævdal et al., 2005; Chen and Zhang,2006). The proposed algorithm enables the use of sparse calibration techniques for computer models when adjoint codes are not avail-able. For the sparse calibration problem, we propose a new algorithm based on the orthogonal matching pursuit (OMP) algorithm (Tropp,2004; Tropp and Gilbert, 2007). The proposed algorithm falls in the class of greedy algorithms for sparse recovery and extends the stan-dard OMP algorithm, which is limited to linear reconstruction problems, to nonlinear parameter estimation problems.

The organization of this paper is as follows: Section 2 presents the derivation of the iterative stochastic ensemble method (ISEM)for parameter estimation. In section 3, an introduction to sparse reconstruction of fields is presented. In section 4, we present a novelsparse nonlinear parameter estimation algorithm based on the ISEM and the OMP algorithm. This algorithm is a nonlinear extensionto the orthogonal matching pursuit algorithm with an ensemble based approximate sensitivities. Section 5, starts by presenting a briefformulation of the subsurface flow problem followed by a numerical evaluation of the proposed algorithm. In section 6, we present adiscussion of the numerical results followed by the current work conclusions.

Parameter estimation algorithmCalibration of subsurface flow models given a dictionary of basis functions for the unknown fields is a nonlinear parameter estimation

problem. Here, we utilize an ensemble based methods for parameter estimation. The proposed parameter estimation method relies onthe Gauss-Newton method and stochastic estimation of the derivatives using an ensemble of directional derivatives. Assuming thenumerical simulator as a multi-input multi-output nonlinear function, the simulator output for a given set of input parameters xi isdefined as yi = H (xi). Given a set of observations yobs, one is interested in finding a set of parameters xest that minimizes the squarederror function

O(x) =12(yobs−H (x))>R−1 (yobs−H (x)) , (1)

where O is the objective function and R is the output error covariance matrix. The least squares nature of the objective function enablesthe definition of the missmatch function F (x) = R−1/2 (yobs−H (x)). The function F can be thought of as a multiple output functionwith no outputs, where n0 is the number of observations. With this formalization, one is interested in solving the following optimizationproblem

argminx

12‖F (x)‖2. (2)

The Jacobian ∇F (x) of F (x) have the components ∂ jFi(x) and the gradient vector G(x) = ∇12‖F (x)‖2 = ∇F (x)>F (x). The general

strategy when solving non-linear optimization problems is to solve a sequence of approximations to the original problem (Nocedal andWright, 2006). At each iteration, a correction ∆x to the vector x is estimated. For non-linear least squares, an approximation can beconstructed by using the linearization F (x+∆x)≈ F (x)+∇F (x)∆x, which leads to the following linear least squares problem

min∆x

12‖∇F (x)∆x+F (x)‖2. (3)

Further, it is easy to see that solving Eq. (3) is equivalent to solving the normal equation(∇F (xk)

>∇F (xk)

)(xk+1−xk) =−∇F (xk)

>F (xk), (4)

where xk is the current value at iteration k. A Newton like iterative update equation easily follows as

xk+1 = xk−(

∇F (xk)>

∇F (xk))−1

∇F (xk)>F (xk). (5)

For high dimensional search spaces, the evaluation of the gradient ∇F (xk) with simple differencing methods is not feasible. Thegradient can be evaluated efficiently using adjoint code but it is not always available for many numerical simulators. Here, we utilize

2 SPE 163582

directional derivatives in random directions u, defined as

∇uF (x) =F (x+hu)−F (x)

h, (6)

where h is the step size. The directional derivative is related to the standard derivative by the following relation

∇uF (x) = ∇F (x) ·u (7)

In the previous equations, ∇uF (x) is of size no×1 and ∇F (x) is of size no×nx where nx is the size of the search space and u is of sizenx×1. In all subsequent formulations we assume a unit step size h.

Iterative Stochastic Ensemble Method (ISEM) Directional derivatives are utilized within a stochastic ensemble method for parameterestimation. We use an ensemble of perturbations to approximate the standard derivative (gradient) from an ensemble of directionalderivatives as

∇UF (x) = ∇F (x) U, (8)where ∇UF (x) is an ensemble of directional derivatives of size no×ne with ne is the ensemble size and U is the perturbation matrix ofsize nx×ne used in estimating the directional derivatives. Multiply both sides from the right side with U> one gets

(∇UF (x))U> = ∇F (x)(

UU>), (9)

from which, the standard derivative can be evaluated as

∇F (x) = (∇UF (x))U>(

UU>)−1

. (10)

For each ensemble member i, the directional derivative around xk has the form

(∇UF (xk))i =−R−1/2 (H (xk +ui)−H (xk)) , (11)

where ui is a zero mean random perturbation in all components of x. For the ensemble of directional derivatives, we can re-write thedirectional derivative matrix as

∇UF (xk) =−R−1/2Y, (12)where Y is of size no×ne and each column i corresponds to (H (xk +ui)−H (xk)). The matrix form of Eq. (10) is then

∇F (x) =−R−1/2YU>(

UU>)−1

. (13)

Using this ensemble based approximate derivative in Eq. (5), an iterative parameter estimation equation is obtained as

xk+1 = xk +

((R−1/2YU>

(UU>

)−1)>(

R−1/2YU>(

UU>)−1

))−1(R−1/2YU>

(UU>

)−1)>

F (xk). (14)

Further simplification results in

xk+1 = xk +(

UU>)((

R−1/2YU>)>(

R−1/2YU>))−1(

R−1/2YU>)>

F (xk). (15)

This formula is the main update equation of the proposed iterative stochastic ensemble method (ISEM). This update equation will beutilized within the nonlinear orthogonal matching pursuit algorithm presented hereafter.

Linear Sparse reconstructionThe calibration process is converted into a sequence of linear problems formulated by Eq. (15). However, distributed parameter

fields (i.e. permeability or porosity) are commonly parameterized to obtain efficient calibration methods (Reynolds et al., 1996; Efendievet al., 2005; Li and Cirpka, 2006). Here, we adopt a novel parameterization that builds a large dictionary of basis functions. These largedictionaries have the advantage of dealing with non-Gaussian models and a mix of models (Khaninezhad et al., 2012a). In this section,we introduce the sparse reconstruction problem which is applicable for linear problems. It is also applicable for nonlinear parameterestimation problems at the linearized iteration level as formulated by Eq. (15).

Given a dictionary of basis functions Ψ ∈ Rm×n of n basis functions, each of size m, the calibration problem is concerned withrepresenting the unknown field as a linear combination of dictionary elements via a vector of the unknown weights x = [x1,x2, · · · ,xn]

> ∈Rn. Let S = i : |xi| 6= 0 be the support of x and let Ψ(S) be the set of basis functions (aka atoms) of Ψ corresponding to the support S .The vector x is said to be k-sparse if the cardinality of the set S is no more than k (i.e., |S | ≤ k). The recovery of a high-dimensional sparsesignal from a small number of noisy linear measurements is a fundamental problem in the field of compressed sensing (CS) (Donoho,2006; Candes and Wakin, 2008). For a linear sparse reconstruction problem, one is interested in finding a sparse weight vector x toreconstruct the signal y ∈ Rm×1 such that y = Ψx, where Ψ is the dictionary defined as Ψ = [Ψ1,Ψ2, · · · ,Ψn] and ψi denotes the i-thcolumn of Ψ. Throughout the paper the matrix Ψ and its i-th column are called dictionary and the i-th atom of Ψ, respectively.

SPE 163582 3

The OMP Algorithm Orthogonal Matching Pursuit (OMP) (Tropp, 2004; Tropp and Gilbert, 2007) is an iterative greedy algorithmthat tackels the sparse reconstruction problem. The algorithm is an extension of the basis pursuit algorithms (Mallat and Zhang, 1992,1993; Pati et al., 1993). The OMP algorithm has been applied to sparse signal recovery in many studies (Donoho, 2006; Tropp andGilbert, 2007). The algorithm tries to solve the problem

minx‖x‖0 subject to: ‖y−Ψx‖2 ≤ ε, (16)

where ‖x‖0 is the `0 norm that counts the number of non-zero components of a vector x. The reconstructed field (signal) y is approxi-mated iteratively by a linear combination of a few basis functions in the dictionary Ψ. These few atoms (basis functions) are included inthe active set which is built column by column, in a greedy fashion. At each iteration, the most correlated column of dictionary with thecurrent residuals is added to the active set.

The OMP algorithm pseudo code is detailed in Algorithm 1. We assume that the atoms are normalized, i.e., ‖Ψi‖2 = 1, fori = 1,2, · · · ,n. We denote the support of x by S(x) ⊆ 1,2, · · · ,n, which is defined as the set of indices corresponding to the nonzerocomponents of x. Ψ(S(x)) denotes the matrix formed by picking the atoms of Ψ corresponding to indices in S(x). In the rest of thispaper, the support S dependence on x will be implied and S will be used instead of S(x). The OMP algorithm starts with x = 0 anditeratively constructs a k-term approximation to y by maintaining a set of active atoms (initially empty), and expanding the set by oneadditional atom at each iteration. The atom chosen at each stage maximally reduces the residual `2 error in approximating y from thecurrently active atoms. At each iteration, the `2 norm of the residual is evaluated and if it falls below a specified threshold, the algorithmterminates. Updating the provisional solution relies on the Moore-Penrose pseudoinverse of a matrix M (Hansen, 1998), denoted as M+.The OMP can be considered as a stepwise forward selection algorithm (Guyon and Elisseeff, 2003).

Algorithm 1: Orthogonal Matching Pursuit (OMP) algorithm.

1 Input:2 Measurement vector y; dictionary Ψ ; error threshold ε; initial support S ;3 Initialize:4 initial solution x0 = 0, initial residual r0 = y, i = 1;5 while ‖ri−1‖2 > ε do

6 t j =(

ψ>j ri−1

)2/‖ψ j‖2

2 ∀ j ∈ 1, . . . ,n (Sweep)

7 tk = argmaxk

tk : ∀k 6∈ S (Find new minimizer)

8 S = S ∪k (Update support)9 x(S) =

(Ψ(S)>Ψ(S)

)+Ψ(S)>y (Update Provisional Solution)

10 ri = y−Ψ(S) x(S) (Update Residual)11 i = i+112 end

Dictionary learning Learning or building a dictionary aims to provide a pool of basis functions in which a few basis functions can belinearly combined to approximate a novel signal or field. Assuming a set of signals, denoted by Y = [y1, . . . ,yi, . . . ,yN ], where yi is theith signal, is available. Dictionary learning methods try to solve the following optimization problem

X,Ψ= argminX,Ψ

N

∑i=1‖yi−Ψxi‖2

2 +λ‖xi‖1

s.t. ‖ψi‖22 ≤ 1, ∀i = 1, . . . ,N,

(17)

where X = [x1, . . . ,xN ] is the coefficient matrix, λ is a regularization parameter. Constraining the `1 norm in the dictionary learningoptimization problem is equivalent to obtaining sparse solution (Donoho, 2006; Candes and Wakin, 2008; Elad, 2010). The optimizationproblem formulated in Eq. (17) is non-convex and NP-hard. Popular dictionary learning algorithm, namely the K-SVD (Aharon et al.,2006) and the Method of Optimal Directions (MOD) (Engan et al., 1999, 2007), attempt to approximate the solution using a relaxationtechnique that fixes all the parameters but one at each iteration and then optimizes the objective function. Both the K-SVD and MODmethods converge to a local minimum that is strongly dependent on the initial dictionary. In this study, we utilize the K-SVD algorithmfor dictionary learning (Aharon et al., 2006) and its efficient implementation developed by Rubinstein et al. (2010). For a detaileddescription of the K-SVD algorithm, interested readers are referred to the orignal work by Aharon et al. (2006) or a reproduction of theK-SVD algorithm by Khaninezhad et al. (2012a) within the context of subsurface flow model calibration.

4 SPE 163582

Sparse nonlinear parameter estimation algorithmIn this section we present the nonlinear orthogonal matching pursuit (NOMP) for sparse calibration of nonlinear models. First, we

want to simplify the update Eq. (15) by studying the properties of the (UU>) term. In case of generating ui = εkwi, where wi is drawnfrom a Gaussian distribution with zero mean and unit variance (N (0,1)) and εk is a constant that depends on the iteration number k, thecovariance of the perturbation matrix (UU>) asymptotically equals ε2

k(ne− 1)I, where I is the identity matrix and ne is the ensemblesize. Applying this results in a simplified update equation

xk+1 = xk +

((R−1/2YU>

)>(R−1/2YU>

))−1(R−1/2YU>

)> (ε

2k(ne−1)F (xk)

). (18)

Here, we want to highlight the difference between the nonlinear parameter estimation problem and the linear reconstruction problem.For the nonlinear case, we express the unknown fields in terms of the basis functions included in the dictionary Ψ and the correspondingweights vector x. The calibration process tries to minimize the mismatch function between the observations and the simulator output inthe `2 norm as ‖yobs−H (Ψx)‖2. This minimization problem is solved iteratively by the update Eq. (18). Adding a sparsity constrainon the solution vector x can be implemented by viewing the update Eq. (18) as the normal equation of the following system

Ak ∆xk = bk, (19)

where Ak =(R−1/2YU>

), bk =

(ε2

k(ne−1)F (xk))

and ∆xk = (xk+1−xk). A sparse solution of this normal equation can be found by theOMP algorithm. However, in contrast to the linear reconstruction problem, the problem is nonlinear and the matrix A is the sensitivityof the solution to the different dictionary atoms. At each nonlinear iteration of the parameter estimation algorithm, a new sensitivitymatrix Ak is estimated. A direct application of the OMP algorithm at the linearized iteration level will produce sparse updates of theparameters. However, a number of sparse updates may not produce a sparse solution after a few nonlinear iterations. This is attributedto the lack of any link (in the solution support sense) between the different updates across the nonlinear iterations.

In order to solve this problem, we propose a natural extension of the OMP algorithm, NOMP, as a greedy algorithm to nonlinearproblems by storing the discovered solution support between the subsequent nonlinear iterations. This is consistent with the logic ofthe OMP as a greedy algorithm. Once an atom of the dictionary is included in the support it is then carried over all subsequent updateiterations. The pseudo-code of the nonlinear orthogonal matching pursuit (NOMP) for sparse calibration combined with the ISEM isdescribed in Algorithm 2. We note two major changes of NOMP from the standard OMP algorithm. First, the solution support is carriedbetween the nonlinear iterations. Second, once the solution support is identified, we use `2 regularization for calculating the residualsbut we limit the solution space to the identified support. The `2 regularization is needed as the estimated sensitivity matrix Ak is rankdeficient and may contain sampling errors. Tikhonov regularization (Hansen, 1998) is applied to the update Eq. (19) as

∆xk =(

Ak(S k)>Ak(S k)+λI

)−1Ak(S k)

>bk, (20)

where Ak(S k) is a restriction of matrix A to the selected support S k and λ is the regularization parameter. We utilize the L-curvemethod (Hansen, 1998) for automatic selection of the regularization parameter. Tikhonov regularization replaces the Moore–Penrosepseudoinverse in the OMP algorithm. The proposed NOMP combines ISEM for estimating the sensitivities based on an ensemblemethods with the OMP at each iteration. It is relatively simple to implement and requires a limited number of input constants that needto be adjusted. We also note that different forms of observation data can be included in the observation vector yobs to account for anydata than need to be assimilated.

In the current algorithm, at each iteration the ensemble members are generated by adding random perturbations. These perturbationsmimic a random stencil for stochastic estimation of the gradient direction. The magnitude of the perturbations is decreased as weapproach the solution via a decaying function. In all our numerical testing, the random perturbations are drawn from the Gaussiandistribution N (0,1) and are scaled by a scaler εk defined by logarithmic rule proposed by Kushner (1987) as c/log(k+1), where c is auser input and k is the iteration number. However, other forms of decaying sequences (Gelfand and Mitter, 1991; Fang et al., 1997) canbe used. In order to ensure error reduction, the iterative update Eq. (18) is modified by introducing a step size α which takes an initialvalue of 1 and is adjusted to ensure error reduction. The modified update equation is

xk+1 = xk +α∆xk (21)

In the numerical testing, α is multiplied by one half if no error reduction is achieved. This is repeated for up to 5 times and if no errorreduction is achieved, the current iteration of the stochastic algorithm is skipped and another ensemble is generated starting from theparameter values in the previous iteration. An more sophisticated step size selection using Wolfe or Goldstein condition could be appliedwithin a line search strategy (Nocedal and Wright, 2006).

Problem formulation and numerical evaluationA two-phase immiscible flow in a heterogeneous porous subsurface region is considered. For clarity of exposition, gravity and

capillary effects are neglected. However, the proposed model calibration algorithm is independent of the selected physical mechanisms.

SPE 163582 5

Algorithm 2: Sparse calibration algorithm.

1 Input: Total ensemble size ne, initial perturbation constant c,2 Maximum number of iteration kmax3 Initialization:4 Randomly initialize x0, S 0 = /0.5 for k = 1 to kmax do6 – Generate and Propagate the ensemble:7 Update εk = c/log(k+1) based on the iteration number k8 yi = H (xk + εk wi), wi ∼N (0,1) ∀i = 1, . . . ,ne9 – Update the sparse support

10 Using Uk,Yk and εk, ne evaluate Ak,bk

11 Call the OMP with an initial support S k−1 and get S k

12 – Update the ensemble:13 U← U

(S k)

14 ∆xk(S k)= ((R−1/2YU>

)>(R−1/2YU>

))−1(R−1/2YU>

)> (ε2

k(ne−1)F (xk))

15 – Step size adjustment:16 Find: α such that O(xk +α∆xk)≤ O(xk)

17 end

The two phases will be referred to as water with the subscript w for the aqueous phase and oil with the subscript o for the non-aqueousphase. This subsurface flow problem is described by the mass conservation equation and Darcy’s law

∇ ·vt = q, vt =−Kλt(Sw)∇p over Ω, (22)

where vt is the total Darcy velocity of the engaging fluids, q = Qo/ρo +Qw/ρw is the normalized source or sink term, K is the absolutepermeability tensor, Sw is the water saturation, λt(Sw) = λw(Sw)+λo(Sw) is the total mobility and p = po = pw is the pressure. In which,ρw, ρo are the water and oil fluid densities, respectively. These equations can be combined to produce the pressure equation

−∇ · (Kλt(Sw)∇p) = q. (23)

The pore space is assumed to be filled with fluids and thus the sum of the fluid saturations should add up to one (i.e., So+Sw = 1). Then,only the water saturation equations is solved

φ∂Sw

∂t+∇ · ( f (Sw) vt) =

Qw

ρw, (24)

where φ is the porosity, f (Sw) = λw/λt is the fractional flow function. The relative mobilities are modeled using polynomial equationsof the form

λw(Sw) =(Snw)

2

µw, λo(Sw) =

(1−Snw)2

µo,Snw =

Sw−Swc

1−Sor−Swc, (25)

where Swc is the connate or irreducible water saturation, Sor is the irreducible oil saturation and µw, µo are the water and oil fluidviscosities, respectively. The pressure Eq. (23) is discretized using standard two-point flux approximation (TPFA) method and thesaturation Eq. (24) is discretized using an implicit solver with standard Newton-Raphson iteration (Chen, 2007). For simplicity, we limitthe parameter estimation to the subsurface permeability map K. We also assume this permeability field as a lognormal random variableas it is usually heterogeneous and shows a high range of variability.

K-SVD Parameterization The reference permeability fields for test problemis shown in Fig. 1b, and represents channelized models.Different realizations of channelized models are generated using the Stanford Geostatistical Modeling Software, S-GeMS (Remy, 2005)based on the training image shown in Figure 1a. The training image shown in Figure 1a is based on a similar example publishedin (Strebelle, 2002). A total of one thousand different realizations were generated and used as an input to the K-SVD algorithm toproduce a sparse parameterization of the search space. Fig. 2 shows 12 basis functions randomly selected from a dictionary of 500 basisfunctions built using the K-SVD algorithm with target sparsity of 20 elements.

In the numerical testing, the discretized model uses a 2D regular grid of 50×50 blocks in the x and y directions, respectively. Thesize of each grid block is 10 meters in each direction and a unit thickness in the z direction. The porosity is assumed to be constant in allgrid blocks and equals 0.2. The water viscosity µw is set to 0.3 cp and the oil viscosity µo is set to 3 cp. The irreducible water saturationand irreducible oil saturation are set as Sor = Swc = 0.2 and the simulations are run until 1 pore volume is injected. For the test problem,two injection/production patterns are used. Fig. 3 shows the location of injection wells as a black dots and the production wells as white

6 SPE 163582

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

(a) (b)

Figure 1: Details of the reference permeability fields for the test problems, (a) shows the log-permeability field (in Darcy) for the training imageand (b) shows the reference log-permeability field for the test problem.

Figure 2: Few basis functions from an overcomplete dictionary generated using K-SVD algorithm.

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

(a) (b)

Figure 3: Injection production patterns (injector highlighted as black dot and producer highlighted as white dot), (a) shows pattern 1 and (b)shows pattern 2

dots. The first pattern has one injection well and four production wells arranged in the inverted five spot pattern and shown in Figure 3a.For pattern 2, shown in Figure 3b, 9 production wells are distributed around 4 injection wells. For the parameter estimation problem, theproduction curves at the production wells are used to define the misfit function and guide the inverse problem solution. Each water-cutcurve was sampled at 50 points and these samples were used for calculating the errors and the update equation. The observation data(water-cut values) is perturbed with uncorrelated white noise with a small standard deviation of 10−6 to be able to perform a convergencestudy for different ensemble sizes.

SPE 163582 7

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

(a) (b) (c)

Figure 4: Calibrated log-permeability field for different ensemble sizes for the water flooding test case under injection/production pattern 1.(a) n = 5, (b) n = 10 and (c) n = 20.

0 100 200 300 400 500−5

−4

−3

−2

−1

0

1

2

3

4

5

Basis Index

Weig

ht

0 100 200 300 400 500−5

−4

−3

−2

−1

0

1

2

3

Basis Index

Weig

ht

0 100 200 300 400 500−1.5

−1

−0.5

0

0.5

1

Basis Index

We

igh

t

(a) (b) (c)

Figure 5: Stem plot of the weights of the calibrated permeability fields for the water flooding test case under injection/production pattern 1.(a) n = 5, (b) n = 10 and (c) n = 20.

Ensemble size Solution cardinality Final RMSE5 60 0.0408

10 58 0.051420 31 0.0667

Table 1: Results of NOMP for the water flooding test case with injection/production pattern 1

Water flooding test case This test case utilizes the reference permeability field shown in Figure 1b. For comparison purposes, allruns were initialized using uninformed prior of a uniform permeability field with log(K) = 0. For injection/production pattern 1, theoptimized permeability fields are shown in Fig. 4 as they result from different ensemble sizes of 5, 10 and 20 members. The smallestensemble of 5 members managed to reproduce the locations of the two channels running along the model. However, this is not expectedfrom every run of the algorithm because of the approximate nature of the estimated derivatives. Also, the problem is ill-posed and mayadmit different solutions. Fig. 5 shows the corresponding weights of the basis functions selected from the dictionary for ensembles of5, 10 and 20 members. For an ensemble of 5 members, the inferred support have 73 non-zero bases as shown in Fig. 5a. The initialroot-mean-square error (RMSE) for the initial permeability field of log(K) = 0 is 0.0691. Table 1, shows the final RMSE values fordifferent ensemble sizes. The number of nonlinear iterations is set to 40, 20 and 10 for the ensemble size of 5, 10 and 20, respectively.This corresponds to the same number of 200 forward runs. It is observed that smaller ensembles produced solutions with larger supportbecause of the increased number of nonlinear iteration. This is reflected in the final RMSE as it is smaller for smaller ensembles incomparison to larger ensembles after 200 forward runs. For injection pattern 2, the optimized permeability fields are shown in Fig. 6for ensembles of 5, 10 and 20 members. The stem plot of the discovered weights is shown in Fig. 7. The initial RMSE for the initialpermeability field of log(K) = 0 is 0.0729. Table 2, shows the final RMSE values for different ensemble sizes. The number of nonlineariterations is set to 40, 20 and 10 for the ensemble size of 5, 10 and 20, respectively. Again, it is observed that smaller ensembles are

8 SPE 163582

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

(a) (b) (c)

Figure 6: Calibrated log-permeability field for different ensemble sizes for the water flooding test case under injection/production pattern 2.(a) n = 5, (b) n = 10 and (c) n = 20.

0 100 200 300 400 500−6

−4

−2

0

2

4

6

Basis Index

Weig

ht

0 100 200 300 400 500−4

−3

−2

−1

0

1

2

3

4

5

Basis Index

Weig

ht

0 100 200 300 400 500−3

−2

−1

0

1

2

3

Basis Index

Weig

ht

(a) (b) (c)

Figure 7: Stem plot of the weights of the calibrated permeability fields for the water flooding test case under injection/production pattern 2.(a) n = 5, (b) n = 10 and (c) n = 20.

Ensemble size Solution cardinality Final RMSE5 48 0.0372

10 36 0.052820 33 0.0579

Table 2: Results of NOMP for the water flooding test case with injection/production pattern 2

more effective in matching the data as the number of nonlinear iterations are larger for the same number of forward runs.

Convergence study In this section we perform a complete convergence study for the test case presented earlier under two injec-tion/production patterns. The stochastic nature of the estimated gradients results in different solution path for each different run. Theaverage of 50 different runs is presented to compare the ensemble size effect on the error reduction rates. Fig. 8 shows the averageRMSE in water cut versus the total number of forward runs under injection/production pattern 1 (left) and pattern 2 (right). It is evidentthat smaller ensembles are more effective for sparse calibration in terms of error reduction and support detection for the same numberof forward runs. Smaller ensembles outperformed larger ensembles on average in all the numerical test cases because of the increasednumber of nonlinear iterations for the same number of forward runs. At each nonlinear iteration, the support is updated as well as themajor search directions are detected. Thus, applying more nonlinear iterations have a positive effect on the error reduction.

Discussion and ConclusionsThe solution of the nonlinear sparse calibration problem is challenging. Not only does the algorithm have to find the optimal

weights to reproduce the measured values, it has to select the basis functions that are included in the solution support as well. Acomplete combinatorial exploration by running standard parameter estimation algorithms on a subset of the basis functions leads to a

SPE 163582 9

0 50 100 150 200

10−1.3

10−1.2

Number of forward runs

Err

or

in W

ate

r F

ractional F

low

curv

e (

RM

SE

)

Ensemble size = 5

Ensemble size = 10

Ensemble size = 20

0 50 100 150 200

10−1.5

10−1.4

10−1.3

10−1.2

Number of forward runs

Err

or

in W

ate

r F

ractional F

low

curv

e (

RM

SE

)

Ensemble size = 5

Ensemble size = 10

Ensemble size = 20

(a) (b)

Figure 8: Average RMSE in water cut curve versus the total number of forward runs for different ensemble sizes with different injec-tion/production patterns, (a) shows results for injection/production pattern 1 and (b) shows results for injection/production pattern 2.

combinatorial problem of huge size that is impossible to solve. In the linear setting, different algorithms for sparse reconstruction canbe used. These algorithms can be simply classified as forward stage wise selection algorithms as the OMP algorithm and optimizationbased algorithms as the iterative re-weighted least square algorithm (IRLS).

In the current paper, we build on the work of Khaninezhad et al. (2012a,b) for sparse calibration of subsurface flow models.Khaninezhad et al. (2012b) utilized the IRLS algorithm for solving the sparse calibration problem. However, the main challenge withthe IRLS is in the specification of a reasonable value for the regularization parameter. To avoid that, Khaninezhad et al. (2012b) modifiedthe IRLS to include the `1 regularization term as a multiplicative term instead of an additive term. This increases the nonlinearity ofthe problem. Also, the minimization algorithm might be attracted to minimizing the data misfit term only and a reduction of the totalobjective function will be observed because of the multiplicative effect.

Here, we utilized the OMP algorithm for solving the sparse calibration problem. OMP has the advantage of low computationalcomplexity and conceptual clarity in comparison to other sparse signal recovery methods. In contrast to the IRLS, the OMP algorithmdepends on a tolerance parameter of the solution residual that is conceptually easy to specify. However, a direct application of OMP willresult in a collection of sparse updates that does not guarantee a final sparse solution. We modified the OMP in a logically consistent waywith the greedy nature of the algorithm by carrying over the discovered solution support across the nonlinear iterations. The transparencyof the OMP algorithm enabled us to develop the NOMP algorithm within the same logical framework.

In terms of results, the calibrated models using the NOMP algorithm did not show extreme values in the inferred permeability fields.This is quite different from the results presented by Khaninezhad et al. (2012a,b). We attribute this to applying `2 regularization ateach iteration once the solution support is discovered. The `2 regularization has the advantage of penalizing large weights that producesrealizations with extreme permeability values. This is also evident from the stem plots showing the weights of the different dictionaryatoms. This is a clear advantage of NOMP over different sparse reconstruction algorithms that only penalize the `1 norm of the solution.Another advantage of the proposed algorithm is in the efficient use of an ensemble based approximate derivative using ISEM. Theproposed algorithm combining ISEM and NOMP facilitates sparse calibration as a Backbox for numerical simulation packages whenthe adjoint code is not available.

ReferencesM. Aharon, M. Elad, and A. Bruckstein. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE

Transactions on Signal Processing, 54(11):4311 –4322, 2006. doi: 10.1109/TSP.2006.881199.

G Berkooz, P Holmes, and J L Lumley. The proper orthogonal decomposition in the analysis of turbulent flows. Annual Review of FluidMechanics, 25(1):539–575, 1993. doi: 10.1146/annurev.fl.25.010193.002543. URL http://www.annualreviews.org/doi/abs/10.1146/annurev.fl.25.010193.002543.

E.J. Candes and M.B. Wakin. An introduction to compressive sampling. IEEE Signal Processing Magazine, 25(2):21 –30, 2008. ISSN1053-5888. doi: 10.1109/MSP.2007.914731.

10 SPE 163582

http://www.annualreviews.org/doi/abs/10.1146/annurev.fl.25.010193.002543

http://www.annualreviews.org/doi/abs/10.1146/annurev.fl.25.010193.002543

Jesus Carrera, Andres Alcolea, Agustin Medina, Juan Hidalgo, and Luit J. Slooten. Inverse problem in hydrogeology. HydrogeologyJournal, 13(1):206–222, 2005.

R. Chartrand and Wotao Yin. Iteratively reweighted algorithms for compressive sensing. In Acoustics, Speech and Signal Processing,2008. ICASSP 2008. IEEE International Conference on, pages 3869 –3872, 31 2008-april 4 2008. doi: 10.1109/ICASSP.2008.4518498.

Yan Chen and Dongxiao Zhang. Data assimilation for transient flow in geologic formations via ensemble Kalman filter. Advances inWater Resources, 29(8):1107–1122, 2006. doi: 10.1016/j.advwatres.2005.09.007.

Zhangxin Chen. Reservoir Simulation: Mathematical Techniques in Oil Recovery. Society for Industrial and Applied Mathematics,Philadelphia, PA, USA, 2007.

D.L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289 –1306, 2006. ISSN 0018-9448. doi:10.1109/TIT.2006.871582.

Y. Efendiev, A. Datta-Gupta, V. Ginting, X. Ma, and B. Mallick. An efficient two-stage Markov chain Monte Carlo method for dynamicdata integration. Water Resources Research, 41(12):W12423, 2005. doi: 10.1029/2004WR003764. URL http://dx.doi.org/10.1029/2004WR003764.

Michael Elad. Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing. Springer, 2010.ISBN 978-1-4419-7010-7.

A.H. Elsheikh, C.C. Pain, F. Fang, J.L.M.A. Gomes, and I.M. Navon. Parameter estimation of subsurface flow models using iterativeregularized ensemble kalman filter. Stochastic Environmental Research and Risk Assessment, pages 1–21, 2012. ISSN 1436-3240.doi: 10.1007/s00477-012-0613-x. URL http://dx.doi.org/10.1007/s00477-012-0613-x.

Ahmed H. ELsheikh, Matt D. Jackson, and Tara C. Laforce. Bayesian reservoir history matching considering model and parameteruncertainties. Mathematical Geosciences, 44(5):515–543, 2012.

K. Engan, S.O. Aase, and J. Hakon Husoy. Method of optimal directions for frame design. In Acoustics, Speech, and Signal Processing,1999. Proceedings., 1999 IEEE International Conference on, volume 5, pages 2443 –2446 vol.5, 1999. doi: 10.1109/ICASSP.1999.760624.

Kjersti Engan, Karl Skretting, and John Hakon Husøy. Family of iterative ls-based dictionary learning algorithms, ils-dla, for sparsesignal representation. Digital Signal Processing, 17(1):32 – 49, 2007. ISSN 1051-2004. doi: 10.1016/j.dsp.2006.02.002. URLhttp://www.sciencedirect.com/science/article/pii/S105120040600025X.

Geir Evensen. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast errorstatistics. Journal of Geophysical Research, A: Space Physics, 99(C5):10143–10162, 1994. doi: 10.1029/94JC00572. URL http://dx.doi.org/10.1029/94JC00572.

Haitao Fang, Guanglu Gong, and Minping Qian. Annealing of iterative stochastic schemes. SIAM Journal on Control and Optimization,35(6):1886–1907, 1997. doi: 10.1137/S0363012995293670. URL http://dx.doi.org/10.1137/S0363012995293670.

Jianlin Fu and J. Gomez-Hernandez. A blocking Markov chain Monte Carlo method for inverse stochastic hydrogeological modeling.Mathematical Geosciences, 41(2):105–128, 2009.

Jianlin Fu and J. Jaime Gomez-Hernandez. Preserving spatial structure for inverse stochastic simulation using blocking Markov chainMonte Carlo method. Inverse Problems in Science and Engineering, 16(7):865–884, 2008. doi: 10.1080/17415970802015781. URLhttp://www.tandfonline.com/doi/abs/10.1080/17415970802015781.

Saul Gelfand and Sanjoy Mitter. Simulated annealing type algorithms for multivariate optimization. Algorithmica, 6(1):419–436, 1991.doi: 10.1007/BF01759052. URL http://dx.doi.org/10.1007/BF01759052.

Isabelle Guyon and Andre Elisseeff. An introduction to variable and feature selection. Journal Of Machine Learning Research, 3:1157–1182, 2003.

C. Hansen. Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion. SIAM, Philadelphia, 1998.

M. Kac and A. J. F. Siegert. An explicit representation of a stationary Gaussian process. Annals of Mathematical Statistics, 18:438–442,1947. URL http://www.jstor.org/stable/2235740.

SPE 163582 11

http://dx.doi.org/10.1029/2004WR003764

http://dx.doi.org/10.1029/2004WR003764

http://dx.doi.org/10.1007/s00477-012-0613-x

http://www.sciencedirect.com/science/article/pii/S105120040600025X

http://dx.doi.org/10.1029/94JC00572

http://dx.doi.org/10.1029/94JC00572

http://dx.doi.org/10.1137/S0363012995293670

http://www.tandfonline.com/doi/abs/10.1080/17415970802015781

http://dx.doi.org/10.1007/BF01759052

http://www.jstor.org/stable/2235740

Mohammadreza Mohammad Khaninezhad, Behnam Jafarpour, and Lianlin Li. Sparse geologic dictionaries for subsurface flow modelcalibration: Part i. inversion formulation. Advances in Water Resources, 39(0):106 – 121, 2012a. ISSN 0309-1708. doi: 10.1016/j.advwatres.2011.09.002. URL http://www.sciencedirect.com/science/article/pii/S0309170811001692.

Mohammadreza Mohammad Khaninezhad, Behnam Jafarpour, and Lianlin Li. Sparse geologic dictionaries for subsurface flow modelcalibration: Part ii. robustness to uncertainty. Advances in Water Resources, 39(0):122 – 136, 2012b. ISSN 0309-1708. doi: 10.1016/j.advwatres.2011.10.005. URL http://www.sciencedirect.com/science/article/pii/S0309170811001977.

H. J. Kushner. Asymptotic global behavior for stochastic approximation and diffusions with slowly decreasing noise effects: Globalminimization via Monte Carlo. SIAM Journal on Applied Mathematics, 47(1):169–185, 1987. doi: 10.1137/0147010. URL http://epubs.siam.org/doi/abs/10.1137/0147010.

Wei Li and Olaf A. Cirpka. Efficient geostatistical inverse methods for structured and unstructured grids. Water Resources Research, 42(6):W06402, 2006. doi: 10.1029/2005WR004668. URL http://dx.doi.org/10.1029/2005WR004668.

M. Loeve. Fonctions Aleatoires de second order. Supplement to P. Levy, Processus Stochastiques et Mouvement Brownien. Gauthier-Villars, Paris, 1948.

Xianlin Ma, Mishal Al-Harbi, Akhil Datta-Gupta, and Yalchin Efendiev. An efficient two-stage sampling method for uncertaintyquantification in history matching geological models. SPE Journal, 13(1):77–87, 2008.

S. Mallat and Z. Zhang. Adaptive time-frequency decomposition with matching pursuits. In Time-Frequency and Time-Scale Analysis,1992., Proceedings of the IEEE-SP International Symposium, pages 7 –10, oct 1992. doi: 10.1109/TFTSA.1992.274245.

S.G. Mallat and Zhifeng Zhang. Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12):3397 –3415, dec 1993. ISSN 1053-587X. doi: 10.1109/78.258082.

Dennis McLaughlin and Lloyd R. Townley. A reassessment of the groundwater inverse problem. Water Resources Research, 32(5):1131–1161, 1996. doi: 10.1029/96WR00160. URL http://dx.doi.org/10.1029/96WR00160.

Hamid Moradkhani, Soroosh Sorooshian, Hoshin V. Gupta, and Paul R. Houser. Dual state–parameter estimation of hydrological modelsusing ensemble Kalman filter. Advances in Water Resources, 28(2):135–147, 2005. ISSN 0309-1708. doi: 10.1016/j.advwatres.2004.09.002. URL http://www.sciencedirect.com/science/article/pii/S0309170804001605.

Geir Nævdal, Liv Merete Johnsen, Sigurd I. Aanonsen, and Erlend H. Vefring. Reservoir monitoring and continuous model updatingusing ensemble Kalman filter. SPE Journal, 10(1):66–74, 2005. doi: 10.2118/84372-PA. URL http://dx.doi.org/10.2118/84372-PA.

Jorge Nocedal and Stephen J. Wright. Numerical optimization. Springer Verlag, 2nd edition, 2006.

Dean S. Oliver, Luciane Cunha, and Albert C. Reynolds. Markov chain Monte Carlo methods for conditioning a permeability field topressure data. Mathematical Geology, 29(1):61–91, 1997.

Y.C. Pati, R. Rezaiifar, and P.S. Krishnaprasad. Orthogonal matching pursuit: recursive function approximation with applicationsto wavelet decomposition. In Signals, Systems and Computers, 1993. 1993 Conference Record of The Twenty-Seventh AsilomarConference on, pages 40 –44 vol.1, nov 1993. doi: 10.1109/ACSSC.1993.342465.

Nicolas Remy. S-GeMS: The stanford geostatistical modeling software: A tool for new algorithms development. In Oy Leuangthongand Clayton V. Deutsch, editors, Geostatistics Banff 2004, volume 14 of Quantitative Geology and Geostatistics, pages 865–871.Springer Netherlands, 2005.

A.C. Reynolds, Nanqun He, Lifu Chu, and D.S. Oliver. Reparameterization techniques for generating reservoir descriptions conditionedto variograms and well-test pressure data. SPE Journal, 1(4):413–426, 1996. doi: 10.2118/30588-PA.

R. Rubinstein, M. Zibulevsky, and M. Elad. Double sparsity: Learning sparse dictionaries for sparse signal approximation. IEEETransactions on Signal Processing, 58(3):1553–1564, 2010. ISSN 1053-587X. doi: 10.1109/TSP.2009.2036477.

Sebastien Strebelle. Conditional simulation of complex geological structures using multiple-point statistics. Mathematical Geology, 34(1):1–21, 2002.

J.A. Tropp. Greed is good: algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10):2231 –2242, 2004. ISSN 0018-9448. doi: 10.1109/TIT.2004.834793.

J.A. Tropp and A.C. Gilbert. Signal recovery from random measurements via Orthogonal Matching Pursuit. IEEE Transactions onInformation Theory, 53(12):4655 –4666, dec. 2007. ISSN 0018-9448. doi: 10.1109/TIT.2007.909108.

12 SPE 163582

http://www.sciencedirect.com/science/article/pii/S0309170811001692


http://epubs.siam.org/doi/abs/10.1137/0147010

http://epubs.siam.org/doi/abs/10.1137/0147010

http://dx.doi.org/10.1029/2005WR004668

http://dx.doi.org/10.1029/96WR00160


http://dx.doi.org/10.2118/84372-PA

http://dx.doi.org/10.2118/84372-PA

an ensemble based nonlinear orthogonal matching pursuit ... · however, the amount of available...

Documents