efficient nonlinear inverse uncertainty...

Efficient Nonlinear Inverse Uncertainty Computation Using Parameter Reduction, Constraint Mapping, and Very

Sparse Posterior Sampling

Michael J. Tompkins1, Juan L. Fernandez-Martinez2,3,4, Tapan Mukerji2, and David L. Alumbaugh1

1Schlumberger-EMI Technology Center 2Department of Energy Resources Engineering

Stanford University 3Department of Civil and Environmental Engineering

University of California Berkeley 4Department of Mathematics

University of Oviedo

Abstract

Among the most important aspects of geophysical data interpretation is the estimation and computation of inverse solution uncertainties. We present a general uncertainty estimation method that allows for the comprehensive search of model posterior space while maintaining computational efficiencies similar to deterministic inverse solutions. Integral to this method is the combination of an efficient parameter reduction technique, like Principal Component Analysis, a parameter bounds mapping routine, a sparse geometric sampling scheme, and a forward solver. Parameter reduction, based on prior model covariance estimates, is required to produce both a reduced and orthogonal model space. Parameter constraints are then mapped to this reduced space, using a linear programming scheme, and define a bounded posterior polytope. Sparse deterministic grids are employed to sample this feasible model region, while forward evaluations determine which model samples are equi-probable. The resulting ensemble represents the equivalent model space, consistent with Principal Components that can be used to infer inverse solution uncertainty. Importantly, the number of forward evaluations is determined adaptively and minimized by finding the sparsest sampling set required to produce convergent uncertainty measures. We demonstrate, with a simple surface electromagnetic example, that this method has the potential to reduce the nonlinear inverse uncertainty problem to a deterministic sampling problem in only a few dimensions, requiring limited forward solves, and resulting in an optimally sparse representation of the posterior model space. Depending on the choice of parameter constraints, the method can be exploitative, searching around a given solution, or explorative, when a global search is desired.

1

1. Introduction

One of the most formidable problems in geophysics is how to accurately and efficiently estimate inverse model uncertainties. There are many reasons for uncertainty in inversion results, the most important of which are measurement error, solution non-uniqueness, data coverage and bandwidth limitation, and physical assumption (e.g., isotropy) or approximation (numerical error). Because uncertainty is omnipresent in all inverse solutions, any interpretation should include its estimate. Unfortunately, this is not commonly the case. In the context of nonlinear inversion, the uncertainty problem is that of quantifying the range or variability in the model space supported by prior information, the data, and any errors. As pointed out by Meju [2009], there are many approaches to the solution of this uncertainty problem. Perhaps the most apparent distinction is between deterministic and stochastic methods. Deterministic methods seek to quantify inverse uncertainty based on least-squares inverse solutions and the computation of model resolution and covariance [Menke, 1994; Tarantola, 2005] or by extremal solutions [e.g., Oldenburg, 1983; Meju, 2009], while stochastic methods seek to quantify uncertainty by casting a problem in terms of random variables and processes and computing statistical moments of the resulting ensemble of solutions [e.g., Tarantola and Valette, 1982; Sambridge, 1999; Malinverno, 2002]. Commonly, deterministic methods rely on linearized estimates of inverse model uncertainty, for example, about the last iteration of a nonlinear inversion, and thus, have limited relevance to actual nonlinear uncertainty [e.g., Meju, 1994; Alumbaugh, 2002]. Stochastic uncertainty methods, which typically employ random sampling schemes in parameter space, avoid burdensome inversions and account for nonlinearity but often come at the high cost of innumerable forward solves [Haario et al., 2001]. Practically, full stochastic methods only apply to very small (tens) parameter spaces.

Some studies have attempted to overcome these difficulties by either extending deterministic techniques or marrying them with stochastic methods. For example, Meju and Hutton [1992] presented an extension to linearized uncertainty estimation for magnetotelluric (MT) problems by employing an iterative most-squares solution; however, due to its iterative extremizing of individual parameters, this method also is practical only for small parameter spaces. Other studies have utilized the computational efficiency of the deterministic inverse solution while incorporating nonlinearity via probabilistic sampling. Several studies demonstrate this for both seismic [Materese, 1995] and electromagnetic (EM) problems [Alumbaugh and

Newman, 2000; Alumbaugh, 2002]. In essence, this hybrid method involves solving either a portion or the entire nonlinear inverse problem many times, while either the observations or prior model information are treated as random variables. This quasi-stochastic uncertainty method is able to account for at least a portion of the nonlinear uncertainty of geophysical inverse problems, but random sampling is computationally inefficient and typically involves hundreds to thousands of inverse solutions

2

[Alumbaugh, 2002] for only modest-sized problems.

The problem of uncertainty has a natural interpretation in a Bayesian framework [See Scales and Tenorio, 2001] and is very well connected to the use of sampling and a class of global optimization methods were the random search is directed using some fitness criteria for the estimates. Methods such as simulated annealing, genetic algorithm, particle swarm, and neighborhood algorithm belong to this category, and these can be useful for nonlinear problems [e.g., Mosegaard and Tarantola, 1995; Sen and Stoffa, 1995; Sambridge, 1999; Fernández Alvarez et al., 2008; Fernández

Martínez et al., 2009]. Probabilistic estimates including uncertainty can be obtained; however, since sampling depends on the dimension of the parameter space, these methods are also limited to small parameterizations and fast forward solvers.

In the current work, we present a scheme that estimates nonlinear uncertainty based on deterministic sampling rules in bounded reduced-dimensional parameter spaces. We demonstrate that this method, first suggested for heat flow problems in random media by Ganapathysubramanian and Zabaras [2007], can be applied to the geophysical inverse uncertainty problem to dramatically improve its tractability. We also show that, unlike Monte-Carlo methods, our deterministic posterior sampling can be de-coupled from forward evaluations (i.e., model likelihoods). For the 1D marine controlled-source electromagnetic (CSEM) problem, we demonstrate that comprehensive posterior sampling can be accomplished in as few as 100 forward solves, while moments of this posterior (i.e., mean and variance) can be well estimated with even fewer forward evaluations. We are able to accomplish this efficiency by combining four well-known techniques: optimal model parameter reduction using Principal Component Analysis (PCA) on estimates of the model covariance matrix, efficient parameter constraint mapping by vertex enumeration, sparse deterministic sampling using Smolyak schemes, and efficient forward simulation.

Our first step is to perform PCA analysis on the a priori model covariance in order to reduce the dimensions of the correlated model space and create an orthogonal model base consistent with the structure of the covariances. Principal component analysis is a well-established method that has been shown to be very effective at compression and parameter reduction [e.g., Jolliffe, 2002; Miranda et al., 2008; Echeverría et al., 2009; Fernandez Martinez et al., 2009].

For estimating uncertainty; however, we must also include parameter constraints to facilitate efficient sampling of only feasible models from the posterior. This is necessary, because we do not use PCA to restrict the sampling of model parameters (only to restrict their structure); our aim is to use parameter constraints to restrict models that are compatible with our prior (i.e., the PCA base). To accomplish this, we use the method of vertex enumeration to map a priori parameter bounds in the original correlated model space to extrema (vertices) in the reduced-dimensional PCA space. Established from the field of computational geometry [See Avis and Fukuda,

3

1992], the vertex enumeration solution defines a bounded polytope that can be used to guide posterior sampling (i.e., points interior to the polytope). Although this method, combined with PCA, cast the uncertainty problem in a more tractable light, there is still the issue of how to sample models from this posterior polytope.

Even in reduced-dimensional spaces, with only a few unknowns, uniform sampling schemes grow exponentially to impossibly large numbers. An alternative approach, presented by Xiu and Hesthaven [2005], is to use a Smolyak sparse grid collocation method based on approximations to the tensor-product rules of multivariate interpolation [Smolyak, 1963]. In essence, this method provides a high-order accuracy integration scheme, with determined nodal points, that depends only weakly on the dimension of the parameter space. Sampling the bounded polytope using these Smolyak grids can represent orders of magnitude efficiencies when compared to uniform or random sampling methods and is tunable depending on the desired completeness of posterior sampling. In contrast to Metropolis-Hastings algorithms, the predefined nature of this sampling, which we call geometric sampling, also allows for its decoupling from the determination of model likelihoods (i.e., no sampling chains). Thus, some sample clustering or rejection can take place before expensive forward evaluations are performed, for example, by model classification.

At some point, however, samples from the bounded posterior must be evaluated for their likelihood (i.e., data misfit). For this, any forward simulation algorithm can be used which accurately represents the physics of the method. Once the likelihoods are computed, models are accepted or rejected based on a threshold value for data misfit. The resulting ensemble of models is an optimally sparse representation of the posterior model space given the a priori information and model covariance structure. The uncertainty of the inverse problem then can be explored through either the model ensemble itself or statistical measures (e.g., mean, covariance, percentile, interquartile range, etc.) computed from this distribution.

As this method is general and agnostic to the physics used, we expect it to be applicable to higher-dimensional model spaces within several classes of geophysical inverse problems including: 3-D gravity, magnetic, land and marine EM and MT, and seismic tomography. In this paper, we review the various aspects of this method, demonstrate its efficacy for a simple marine CSEM example, and provide some insights into its use for large-scale inverse problems.

2. Uncertainty Estimation

Let us refer to the model parameter vector by NRMm ⊂∈ , where M is the set of admissible models formulated in terms of geological consistency. In geophysical inversion, the model m is a physical quantity we want to reconstruct. For the uncertainty problem, we are interested in finding the family of models belonging to

M that fit the observed data m∈d R within the same tolerance (tol):

4

( )|| ||p tol− <F m d . (1)

Here F(m) represents the predictions made by the forward model with respect to

the observables, d, and || ||p⋅ stands for the norm needed to compute the distance

between the observed data and the predictions.

In this paper, we will use two representations of uncertainty: the model realizations themselves and the posterior model covariances. The ensemble of equivalent models is useful for understanding the various classes of data-supported inverse solutions and will be used in our discussion for a qualitative comparison of uncertainty. The posterior model covariance matrix, in contrast, is very useful, as it contains quantitative information regarding both individual parameter variances as well as parameter dependencies. While, in general, we recognize that the model covariance matrix is only a vague measure of the posterior distribution, we use it here to facilitate visual and quantitative comparison (in 1D) between various uncertainty estimates.

In our method, we use two types of the model covariance matrix. The first type, computed a priori, is used to determine the principal components of our reduced-dimensional model space and is calculated using a linearization about the last iteration of the nonlinear deterministic inverse solution. Based on linear inverse theory, this covariance can be computed using:

1T2 ][][ −= mffm

linJJσC , (2)

assuming an unbiased solution (i.e., no penalty function) and independent data errors of equal variance, σ2 . In equation 2, Jmf is the model Jacobian matrix computed about the last iteration of the nonlinear inversion. This linear covariance matrix constitutes the prior information for our nonlinear uncertainty estimation method and must be computed as the first step (i.e., we must perform one deterministic inversion). The Jacobian matrix is rank deficient, and the dimension of its null space accounts, locally, for the uncertainty around the inverse model, mf.

The second type of covariance we employ is the actual solution we seek to our nonlinear uncertainty estimation problem and is computed as an experimental covariance from the ensemble of sampled posterior models. To estimate the nonlinear model covariance elements, we need to evaluate:

[ ] )())(( mmC Pdmmmm jjiiij

nl ∫ ><−><−= (3)

where mi|j denotes the ith/jth model parameter for the inverse problem with expected value <mi|j>, and P(m) is the posterior joint probability density of the N model parameters. In order to compute, exactly, the elements of the covariance matrix, we

5

must integrate over the complete probability distribution of the model parameters. Of course, in practice we are interested in finite-volume domains and only estimates of these integrals, which amount to solving a numerical integration problem:

∑=

⟩⟨−⟩⟨−=K

i

i

T

i

nl

K 1

],[][1

mmmmC (4)

where mi is now the ith sampled model vector, <m> is the expected (average) solution model vector given all posterior samples, and K is the total number of samples for which the integration is performed. Equation 4 amounts to reducing the infinite-dimensional probability space of equation 3 to a finite-dimensional space, assuming all K samples are drawn from a uniform posterior with individual probabilities 1/K. Equation 4 approximates the covariance by directly sampling a large portion of the model posterior; however, this may not be desirable as we may be interested in only a portion of the posterior, excluding, for example, certain classes of models based on prior information. In this case, the K samples in equation 4 are those that meet some threshold criteria, for example, consistency with known parameter bounds. The resulting covariance matrix, for a one-dimensional model, can be represented as a 2D color-coded matrix where the diagonal elements are the individual parameter variances and the off-diagonal elements are the parameter covariances (Figure 3).

3. Parameter Reduction

3.1. Problem Definition

Consider a simple 1D model from the marine controlled-source electromagnetic (CSEM) problem (Figure 1). In this synthetic example, the data space is defined by 34 complex-valued radial electric field observations at two frequencies (0.25 and 1.25Hz) and offsets ranging from 500-7000m (Figure 2). The true electrical model is represented by an isotropic earth containing an air layer (not shown), sea layer, and low conductivity (.03S/m) zone at 2200m depth (Figure 1). We solved the 1D non-linear inverse problem using the quadratic programming algorithm of Tompkins and

Alumbaugh [2002] modified for the marine CSEM problem, incorporating first-derivative smoothness constraints, and containing constant lower and upper conductivity box constraints (.01S/m and 2.0S/m, respectively). The inverse model space included 33 fixed-depth layers (Figure 2) of unknown isotropic resistivity (all with 1.0Ωm starting values) and 2 layers of known resistivity for water (3S/m) and air (1.0e-10S/m). One solution to this over-parameterized inverse problem is shown in Figure 2 along with the true model. For the purposes of this work, we take for granted all of the intricacies of solving this deterministic inverse solution, as the focus here is on the uncertainty problem and not any one specific solution.

6

Figure 1: Schematic showing the synthetic marine CSEM earth model used.

Figure 2: 1-D Synthetic marine CSEM earth model (dotted green line) presented with the

deterministic inverse solution (right). Both predicted and observed amplitude (left) and phase (middle) data are also shown. There were 34 data and 33 layers in the inverse problem.

7

3.2. Principal Component Analysis

Principal component analysis (PCA) [Pearson, 1901] is a well-known mathematical procedure that transforms, in an optimal way, a number of possibly correlated variables (i.e., model parameters in this case) into an equal number of uncorrelated variables, called principal components, while maintaining their full variance and ordering the uncorrelated variables by their contributions. The resulting transformation is such that the first principal component represents the largest amount of variability, while each successive component accounts for as much of the remaining variability as possible. This procedure is also called the discrete Karhunen–Loève transform and is one of the most useful techniques in exploratory analysis. Typically, it is performed either on the data matrix (by singular value decomposition) or on the covariance matrix (by eigenvalue decomposition). In both cases, the ordered principal components can be used to reduce the number of variables by selecting only the first few components that capture most of the variance. In our case, we perform the PCA on the a priori model covariance centered about an inverse model. In this way, PCA consists of finding a reduced-dimensional orthogonal base, consistent with the correlation of our inverse model that can be used as a new model parameter base.

We use the previously described deterministic CSEM inverse solution to estimate both the mean (inverse) model and the linearized model covariance matrix using equation 2. The resulting 33x33 covariance matrix for our CSEM example is shown in Figure 3 as a color-coded 2D matrix. This covariance matrix is symmetric and semi-definite positive, hence, computation of the principal components reduces to the eigenvalue problem [Jolliffe, 2002] for N model parameters,

iii

lin vvC λ= , (5)

where the N eigenvectors (i.e., principal components), vi, are orthogonal and of length N, and the eigenvalues, λi, are positive or zero. Eigenvalues can be ranged in decreasing order, and we can select a certain number of them to match most of the variability of the models. That is, the d first eigen-models ( d N<< ) representing most of the energy spectrum of the decomposition are chosen. Then any model

belonging to our original space of plausible models, Mm ∈k , can be represented as

a unique linear combination of the eigen-models:

∑=

+=d

ii

iik a vm µ , (6)

where µ is the mean model, and ai are real-valued coefficients. The residuals,

ˆk −m µ , then span the reduced-order linear space 1 2, , ,d dV = v v vK generated by

8

the d first eigenvectors of the covariance matrix. Now the model space as been effectively reduced from N correlated pixels (layers) to d independent coefficients.

Figure 3: Color-coded linearized model covariance (S2/m2) matrix computed at the last

iteration of the nonlinear inversion in Figure 2. The diagonal elements represent the individual layer conductivity variances, while the off-diagonals represent covariances between layers.

We show results of this PCA in Figure 4 for our CSEM example. Figure 4a

displays the 33 eigenvalues generated using the model covariance from Figure 3, and Figure 4b shows the inverse model solution along with model reconstructions using a range of eigenvectors. Apparently, for this simple model, we capture the vast majority of variance in the first 4-8 eigenvectors (Figure 4a); however, based on the model reconstructions there are practically no changes if we use more than 4 eigenvectors (Figure 4b).

In general, when we increase the number of eigenvectors in the reconstruction, we are able to match finer scales of heterogeneity in our model space, if they exist. However, the level of detail we have to consider is an important question, since all the finer scales might not be informed by the observations. That is, they belong to the null space of the forward operator. By truncating the number of PCA terms we use in the expansion in equation 6, we are setting these finer scales to zero, avoiding also the risk of over fitting the data. In other words, the use of a truncated PCA base provides a kind of natural solution (i.e., smallest norm).

9

(a)

(b)

Figure 4: Results of the PCA of the linearized covariance matrix in Figure 3. a) Relative eigenvalue amplitudes. b) 1D conductivity model reconstructions.

The reader should note that while we use the linearized covariance matrix for our

PCA here, we could have also computed an experimental covariance, as in equation 3, using, for example, sampling about some test model via geostatistical simulation algorithms to incorporate spatial correlations, and finer scales of heterogeneity. In this case, the mean model would be represented by the average of these model samples instead of the inverse model as shown here.

10

4. Parameter Constraint Mapping

Once we compute the PCA base for our reduced-dimension model space, we can solve the inverse uncertainty problem by sampling models in the PCA base. In this context, equation 6 is used to generate a comprehensive set of feasible earth model

vectors, mk, by sampling coefficient vectors, ( )dk aa ,...,1=a , with their

corresponding eigen-models. Although this sampling in the reduced space is sufficient to solve the stochastic uncertainty problem, it is not very efficient. In particular, every model, mk, has a unique mapping to the PCA base via the coefficients in equation 6; however, the inverse is not true (i.e., the mapping is not bijective). That is, any coefficient vector, ak, sampled does not, necessarily, produce a

model in the feasible space, Mm ∈k . As a result, we must construct a space of

acceptable coefficient vectors whose resulting models do belong to the feasible space. As outlined by Ganapathysubramanian and Zabaras [2007] this construction amounts to finding the largest subset, M⊂S , such that for any vector,

( ) Saa dk ∈= ,...,1a , the model, ∑=

+=d

ii

iik a vm µ , is feasible. From their

formulation, we define individual parameter bounds, l(n) and u(n), in our original model space and “map” these to the reduced-order space by solving a minimization problem with linear inequality constraints:

find the region M⊂S , subject to the constraints

),()()()( nunvannld

ii

ii ≤+≤ ∑=

µ .,...,1 Nn = (7)

Solutions to this problem, known as the vertex enumeration problem in

computational geometry, have been established since the 1950s [See Avis and

Fukuda, 1992]. In essence, all constraints in equation 7 are bounded by a hyperplane, which, together, define a convex polyhedron in N-dimensional space. The solution to the problem in equation 7, then, is the computation of the vertices (intersections) of this polyhedron with the reduced-dimensional hyperspace. In this way, the vertices represent extrema of the region S of allowable coefficient vectors. To implement constraint mapping in our algorithm, we solve equation 7 with the double description method described by Fukuda and Prodon [1996].

As an illustrative example in two dimensions (x1, x2), consider the following system of linear inequalities:

11

001

111

5.5.5.

5.5.5.

402

420

21

21

21

21

21

21

≤+−

−≤−−

−≥−−

≤−

≤+

≤+

xx

xx

xx

xx

xx

xx

. (8)

After solving equation 7 for these inequalities, the polytope defined by the

vertices of this constraint polyhedron is shown in Figure 5. Several important properties are demonstrated by this simple example. First, the polytope is bounded (i.e., we have upper/lower bounds for each parameter) and convex, due to the linear nature of the bounding constraints. Second, there may be redundant vertices. In this case, there are 6 constraints but only 5 vertices, since two inequalities result in the same vertex (1x1 -1x2 ≤-1, -.5x1 -.5x2 ≤-.5). Third, although it is convex, in general, the polytope does not define a hypercube (i.e., square in 2D). This means that exact interior point sampling of a polytope will depend on the number of non-redundant vertices (i.e., dim=5) rather than the dimension of the original hyperspace (i.e., dim=2). Finally, the vertices, themselves, are members of the extremal set for [x1, x2]. Regarding sampling, two things are evident: sampling the polytope is equivalent to sampling the region of feasible points defined in the original hyperspace, and approximate interior point sampling will be required unless the number of vertices is small.

For solving the uncertainty problem, it is natural to define parameter bounds in two ways. First, we could define the bounds based on some prior knowledge of the smallest and largest allowable or expected property values. In our CSEM example, this might include constant bounds:

0.03S/m < conductivity < 2.0S/m. (9)

In this way, the bounds define a kind of equi-feasible region in “global” model

space. A second way to define the bounds is based on the individual parameter variances, σlin, from the a priori linearized covariance matrix of equation 2. If we choose, say,

2)()()(),( nnnunl σµ ±= , (10)

12

Figure 5: Example 2D vertex enumeration solution. The polytope (solid black line) is defined

by the vertices (circles) from solutions to eqn. 7. For more efficient sampling on Cartesian grids, a hypercube (dotted line) can be circumscribed to the polytope.

then the bounds represent a “local” region of probability within one standard deviation of the inverse model, µ, used to compute the covariances. In the context of sampling posterior model space, the global bounds can be considered more explorative, while the local bounds are more exploitative.

Although it is not practical to plot vertices in 4D, our CSEM example, using the first four eigenvectors of equation 5, produced 120 and 295 non-redundant vertices for these two bound sets, respectively. Now it also is apparent that the number of vertices can be significantly larger than the dimension of the reduced-order model space. Since we do not want to sample as a function of the hundreds of vertices, we need to determine a sparse sampling alternative.

5. Sparse Geometric Sampling

To this point, we have presented a way to decorrelate and reduce a parameter space. In addition, we demonstrated that parameter constraints can be mapped to a bounded region of feasible model space. These are important steps which allow for efficient sampling of models from the posterior. However, perhaps the most critical step is the way in which we perform the sampling itself. To reiterate, we wish to sample model vectors, mk, in our original space by sampling coefficient vectors,

( )dk aa ,...,1=a , in our reduced-order space. Since these coefficients are sampled on

the polytope and prior to any forward evaluations, we refer to them as geometric samples.

13

As mentioned, the polytope vertices themselves represent samples from the extremal set of feasible coefficient vectors. Thus, an obvious way to sample the posterior polytope might be to sample either the vertices or combinations of them. It follows directly from the convexity property of the polytope that a linear interpolation of any two or more vertices produces a point on the edge or interior to the polytope. In this way, we could sample the posterior simply by taking the vertices and midpoints of every combination of two vertices. The question is: how many samples would be required? In the case of vertex midpoint sampling, the number of samples required grows with the total number of vertices, T, as: )!2/(!2/1 −× TT , if we take

two vertices at a time. For our CSEM example, this type of sampling produces either 7,140 or 43,365 samples, depending on whether local or global bounds are employed.

A more scalable way to sample the polytope is to circumscribe a hypercube to the polytope, so that sampling can be performed on a Cartesian grid in the dimension of the reduced model space (Figure 5). In this way, we approximate the polytope, which introduces some coefficients that are not strictly feasible (i.e., regions inside the hypercube but outside the polytope). However, our model samples are determined prior to any forward evaluations, so we can test/reject samples for feasibility in the original model space before any evaluation. Though this introduces some inefficiency in our sampling, it is still far more efficient than not computing the polytope at all. This is because the circumscribed hypercube is only partially empty with respect to feasible coefficients (due to the fullness of the bounded polytope). However, a completely unbounded reduced-dimensional space is still exceedingly empty [See

Tarantola, 2005], and sample rejection could be painfully inefficient.

Before we can sample along the axes of the reduced-dimensional hyperspace, we need to determine the sampling scheme. Uniform sampling is not practical, since the number of samples grows exponentially with the dimension of the hyperspace—the so-called “curse of dimensionality”. As outlined by Xiu and Hesthaven [2005] and highlighted by Ganapathysubramanian and Zabaras [2007], we can borrow a scheme used in multivariate interpolation: Smolyak’s method. With this method (Smolyak, 1963), well-established univariate interpolation formulas (e.g., Gauss quadrature, Clenshaw-Curtis, Chebyshev polynomials, etc.) are extended to the multivariate case by using tensor products in a distinct way [Xiu and Hesthaven, 2005]. As a result, we can perform accurate interpolation that requires orders of magnitude fewer nodes (samples) than conventional interpolation on full uniform grids. If chosen smartly, the predefined points comprising the sparse grid can also be nested allowing for adaptive refinement of sampling for increased accuracy.

Following the formulation of Xiu and Hesthaven [2005], we can form a one-dimensional interpolation for a smooth function f as:

∑=

=im

k

i

k

i

k

i aYffU1

)()( , (11)

14

where the weights, i

ka , are given by the chosen interpolation formula (e.g., e-x2 for

Gauss-Hermite) and evaluated at nodes, i

kY , in the nodal set:

),...( 1i

m

ii

iYY=Θ , (12)

which are determined by the roots of the interpolation polynomial chosen. Now, the Smolyak method gives the multivariate extension of this 1D interpolation in d

dimensions as:

)...(1

)1(),(1

di ii

qdq

qUU

q

ddqA ⊗⊗⋅

−

−⋅−= ∑

≤≤+−

−

ii

i , (13)

where i = (i1,…,id) represents the individual dimensions of the multivariate space, ⊗ is the Kronecker product, and, q=d+k, is defined as the level of the interpolation. The sparse nodal set that results from equation 13 is:

)...(),( 111

1 dii

qdq

dqH Θ××Θ=≤≤+− i

U . (14)

If we choose Chebyshev polynomial roots for our nodes, the number of nodes generated for ),( dqH is given by:

1,!

2~)),(dim( >>dd

kdqA k

k

, (15)

which is much less than that given by full tensor products, k

d, when d>k. This property is important, because it allows for the application of these sparse grids to problems of large dimensions. Perhaps equally important is the fact that the nodes provided by this formula are nested. Thus, if we wish to improve the accuracy of our interpolation from level q to level q+1, we simply need to sample at the differential nodes between the two levels, which provides a means for adaptive sampling. That is, we simply evaluate samples at very sparse grid levels first then incrementally increase the level until some desired convergence. We provide some example grids in Figure 6 for dimension d=2.

Now, if we consider our model sampling problem as a piecewise linear interpolation problem over our d-dimensional hypercube approximation to the polytope, we can use these sparse nodal sets as our coefficient samples,

( )dk aa ,...,1=a . To determine the nodes, ),( dqH , we use sparse grids based on

15

Chebyshev polynomial extrema. In our CSEM example, a level q=4 grid in dimensions d=4 produced 264 samples.

6. Sample Evaluation

Since we approximate the posterior polytope with a hypercube, we must take a step to evaluate our coefficients for feasibility. To do this, we simply compute the model vectors, mk, from our sampled coefficients using equation 6, and determine whether they lie within the bounding constraints of equation 7. Any model samples that lie outside these bounds are rejected. The remaining accepted models then constitute our equi-feasible model ensemble.

(a) (b)

(c)

Figure 6: Example Smolyak sparse grids in 2D, on the intervals [0,1], based on Chebyshev polynomial. a) Level-2 sparse grid. b) Level-3 sparse grid. c) Level-4 sparse grid.

Thus, we have completed the geometric sampling of our approximation to the

posterior polytope. Our ensemble forms a comprehensive set of feasible models, mk, and we have not performed any forward evaluations (beyond those required for the initial deterministic inversion). However, we still need to evaluate these models in

16

terms of the data misfits and reject any that are outside the tolerance defined in equation 1. To do this, we compute the predicted data using the individual model samples and calculate the data misfit using a root mean squared error measure. Of course, an accurate forward simulation algorithm is required for this. For our CSEM example, we use the algorithm presented by Tompkins [2003], which computes 1D EM fields in transversely isotropic media, using field formulations modified from Tang [1979]. Once we simulate the fields, determine the data misfits, and reject less fit models, we are left with the final ensemble of equivalent models.

7. Examples

7.1. Single-Resistor Model

To demonstrate this method in full, we return to the marine CSEM example presented earlier. To review, our goal is to sample the posterior model space supported by the data and structure of the model covariance matrix. As shown, we have two choices for how broadly we sample the posterior. We can choose local bounds based on statistics from the model covariance (equation 10) or we can set bounds based on global conductivity constraints (equation 9). We show results for both here.

If we take the deterministic inverse solution from Figure 2, the prior linearized covariance from Figure 3, and the first 4 eigenvectors shown in Figure 4, we can construct two vertex enumeration problems, as in equation 7, based on both local and global bounds, as presented before. In these two cases, 120 and 295 non-redundant vertices were found. Based on sparse hypercube sampling of these polytopes, we generated the equi-feasible model sets shown in Figures 7a and 7b for local and global conductivity bounds, respectively. While there were 137 grid points (q = 3) generated for the local bounds case, there are less than 137 samples in Figure 7a. This is due to rejection of samples, prior to misfit evaluation, based on their lack of feasibility (due to our approximate hypercube sampling of the polytope). We rejected 33 samples from the original ensemble leaving 104 equi-feasible models. For the global bounds case, a higher level grid (q = 6) was chosen for the sparse grid (2929 nodes) given the number of feasibility rejections (2667). In this case, 262 equi-feasible models were generated (Figure 7b). Using these two sets of equi-feasible models, we computed the data misfits and performed a final model rejection based on an RMS threshold of 10%. The resulting equi-probable model ensembles are shown in Figure 8 with their corresponding experimental covariance matrices in Figure 9. In the local bounds case (Figures 8a and 9a), there were 101 final equi-probable models. In the global bounds case, there were 148 equi-probable models (Figures 8b and 9b).

It is evident from Figure 7 that while the locally-bounded polytope results in models densely sampled about the inverse model (bold black line in Figure 8a), the

17

globally-bounded samples span a much wider class of models—even including several equivalent models with two resistive layers located between 2200 and 2300m depth (e.g., bold black line in Figure 8b). This is somewhat expected given our choice of bounding constraints and limited resolution of our low-frequency EM method, but the result is still illuminating and demonstrates the importance of comprehensively sampling the posterior polytope; these two-resistor models would most certainly not be evident from inspection of the linearized covariance matrix. However, there is a trade-off with this explorative sampling; we expect less of the feasible models to fit the data to within our misfit tolerance, which results in less efficient sampling and uncertainty estimation. This is illustrated in Figures 7b and 8b, where only 148 of the 262 evaluated samples are equi-probable. This is compared to the local bounds case where 101 of 104 total samples are equi-probable. The global ensemble, however, manages to better sample the global minimum (dotted black line in Figures 8a and 8b) of the problem and estimates larger covariances (Figure 9). Both of these nonlinear covariance matrices also vary significantly from the prior linearized covariance matrix in Figure 3. In particular, the linearized parameter covariances tend to be smaller than the nonlinear estimates, a phenomenon observed by others as well [Alumbaugh, 2002]. Given these results, two practical and opposite questions arise: how complete is the posterior sampling and how sparse can we sample?

(a) (b)

Figure 7: CSEM model posterior samples generated with sparse hyperspace sampling of the posterior polytope. a) Local bounds polytope (104 equi-feasible models). b) Global bounds

polytope (262 equi-feasible samples).

18

7.2. Sampling Completeness

The first question is essentially one of accuracy. We want our samples, however sparse, to be representative of the “true” model posterior. To address this, we take the global bounds case above and generate a very comprehensive grid by sampling the polytope directly; we sample all vertices and midpoint combinations of vertices as suggested in Section 5. This results in an ensemble of 7,140 equi-feasible models of which 1,325 are equi-probable (Figure 10a). We consider this sampling to be a “complete” sampling set. We can then compare the results from this set with the sparsely sampled (q=6) results shown in Figure 8b (Figures 10 and 11) to determine how complete the sparse sampling is. Here, we also include ensemble and covariance results for the level-7 sparse grid, which included 366 equi-probable samples. From inspection of Figure 10, we see that both the vertex-sampled and sparsely-sampled ensembles compare well, displaying similar model classes. This is not surprising, since the sampling, however sparse, is designed to span the entire polytope generating similar extremal samples at all grid levels. Although the number of samples evaluated for the three grids differ by more than a factor of 27, the covariances in Figure 11 also are remarkably similar in structure. This similarity is particularly apparent between the level-6 and level-7 sparse covariances.

(a) (b)

Figure 8: CSEM equivalent models after rejection (RMS>10%) of samples from Figure 7. a) Locally-bounded polytope (101 models). b) Globally-bounded polytope (148 models). The

true (dotted), inverse (bold in (a)), and one multi-resistor model (bold in (b)) are highlighted.

19

The covariance matrix from direct polytope sampling does show structure that is

not captured in the covariances from sparse sampling. In particular note the high (absolute) values on some of the off-diagonal terms that are not seen in the covariance matrices from sparse sampling (Figure 11). To determine quantitative differences between these covariances, we define a relative root mean squared error between the model covariance matrices generated with the vertex sampling and the sparse samplings as:

ErrRMS = ∑

−2

2 )(min

)(1Vertex

Sparse

ij

Vertex

ij CC

N C, (16)

where the sum is performed over all covariance indices, and N2 represents the total number of elements. We compute this RMS error for level-6 and 7 grids and present the results in Table 1. If we also compute an RMS error between the two sparse sample covariances we find that they differ by less than one percent. Considering that the number of equi-feasible models evaluated (i.e., forward solves) at level 7 is more

(a) (b)

Figure 9: CSEM experimental posterior model covariance matrices computed from the model ensembles in Figure 8. Covariances were computed from the base10 log of the models to enhance comparison of small and large value parameters. a) Local bounds covariance. b)

Global bounds covariance. than twice that at level 6 (i.e., 656 vs. 262), there seems to be little benefit in increasing from level-6 to level-7 sampling (which is why level-6 sampling was included in Figure 8b). This is equally apparent when comparing the RMS errors between these two sparse covariances and the vertex-sampled covariances, where the errors are nearly identical (.41 and .39). Although these RMS errors between the vertex- and sparse-sampled covariances are measurably larger than those between the sparse grids, we consider this result exceptional considering that the number of model

20

evaluations required for the vertex sampling is more than 27 times that required for level-6 sampling. This is an important point and demonstrates that it is much more important how and where we sample the posterior polytope rather than how much we sample it, which brings us to the question of optimal sample sparsity.

7.3. Sampling Sparsity

The second question we posed seeks to identify the sparsest (i.e., optimal) grid required to produce a stable uncertainty estimate. The answer will depend on the problem being solved; however, samples determined at increasingly higher grid levels are always nested, so we need only start at the sparsest grid and iteratively increase the grid level until the uncertainty estimates (e.g., covariances) converge. Thus, the problem of determining the optimally sparse grid becomes one of convergence in the covariance estimates. To address this in our CSEM example, we start with the local bounds case presented before, which used a sparse grid level of q=3, and sample at additional grid levels 1, 2, 4, 5, and 6 (i.e., 9, 41, 401, 1,105, and 2,929 grid nodes). We then compute the RMS error, using equation 16, between each level covariance matrix to determine convergence. Results for the covariances generated at grid levels 2-5 are presented in Figure 12 along with relative errors between all grid levels (Table 1, Figure 13). The covariance structures are very similar, even at the sparsest grid levels, but the fact that the general magnitude and structure of the covariances are captured so well with only a few sample evaluations (32 samples at level 2) is surprising. Again, this is due to the fact that samples generated at every grid level span the entire posterior polytope (actually, the hypercube approximation to the polytope). Thus, integration of these samples should generate similar covariances as long as we are sampling at a minimum level.

Table 1: RMS error convergence for uncertainty estimation. Errors are for each grid level relative to the next level (local bounds case) or the “best” vertex sampled grid (global bounds

case). “Grid Samples” denote the complete sample set, while “Feasible Samples” are the number of forward solves. “Probable Samples” represent the number of equivalent models.

21

By contrast, the actual errors in Table 1 suggest that the level-1 and -2 grids, with

only 8 and 31 equi-probable samples, are perhaps too sparse, with errors of 25% and 9%, when compared to parameter covariances from adjacent grid levels. The level-3 grid covariances (101 equi-probable samples), however, differ less than 3% relative to those of level 4 (294 samples), while level 4 differs by only 2% from level 5 (809 samples). From the complete error curve in Figure 13, it appears as though convergence is reached between levels 3 and level 4, but this ultimately depends on our threshold for relative error. From these results, it seems as though level-3 sparsity would suffice, if we chose the acceptable covariance error to be 5%. If the acceptable error was lower, say 2%, we would have to evaluate the covariances at level 5 to determine convergence between levels 4 and 5 (i.e., RMS4-5=2.0%). Of course, we could also look for convergence in different statistical moments, like the mean. The important point here is that we only have to sample as much as absolutely required by our defined level of accuracy. Thus, we define an optimally sparse grid representation of our model posterior polytope.

8. Discussion

We have demonstrated the basic principles of our uncertainty estimation method for a simple Marine CSEM inverse example; however, solving the 1D uncertainty problem is hardly the end goal for most geophysical applications. Ideally, we are interested in large-scale inverse problems in 2 and 3 dimensions and with thousands to millions of parameters. To this end, there are several important aspects of this method that allow for its application to the large-scale uncertainty problem.

In step one of our method, we estimate the linearized model covariance by solving the deterministic inverse problem and inverting the corresponding model Hessian. This is probably not practical for many large-scale problems, since many null-space inverse algorithms do not even compute (or store) the model Jacobian. However, as suggested in Section 3.2., we need not estimate the priori covariance in this way. We can just as easily compute an experimental covariance using equation 4. In this case, we could compute an approximate covariance around some prior model using either random or geostatistical model sampling, as proposed by Echeverria et

al., [2009] and Fernandez Martinez et al., [2009]. We feel this will be the best approach for large-scale problems. This will also help to address the issue that the covariance estimated from an inverse solution will always be too smooth compared to the covariance of the true earth model due to resolution limits and regularizations used in deterministic inversions.

22

(a) (b)

(c)

Figure 10: Final equi-probable CSEM model posterior ensembles using global parameter

bounds and generated by (a) direct polytope sampling (1325 equi-probable samples), (b) level-6 sparse sampling (148 equi-probable samples), and (c) level-7 sparse sampling (366 equi-

probable samples). The true model (dotted) and inverse model (bold line) are shown.

23

(a)

(b)

(c)

Figure 11: Final CSEM posterior covariance matrices using global parameter bounds and

model ensembles in Figure 10. Covariances were computed from the base10 log of the models to enhance comparison of small and large parameters. a) Direct polytope sampling

covariances. b) Sparse (q=6) sampling covariances. c) Sparse (q=7) sampling covariances.

24

Another consideration for the application of our method to large-scale uncertainty problems is timing. There are two potential computational bottlenecks in our method: solving the vertex enumeration problem and the eigenvalue problem. The Fukuda and

Prodon [1996] algorithm we implement for the vertex enumeration requires O(dv2n) time, in d dimensions, v vertices, and 2n inequalities. Since the timing and number of inequalities grows only linearly with the dimension of the original parameter space, this algorithm has the potential to scale for much larger parameterizations. The same is not true for the eigenvalue problem, and we recognize that solving equation 5 will be problematic for very large parameterizations. There is the potential for implementing computationally-efficient parameter reduction using covariance-free methods, but we leave the elucidation of these to future work.

Finally, storage is always an issue when solving large-scale problems. This is particularly true for highly under-determined inverse problems, where the covariance matrix can require orders of magnitude more space than the Jacobian itself. Because we determine nonlinear uncertainty based on posterior model ensembles, we do not need to store or compute the posterior covariance matrix. In fact, there are several measures of model uncertainty that can reduce to nominal model dimensions (e.g., quartiles or parameter correlations). In addition, the algorithm we use for solving the vertex enumeration has very efficient storage properties, requiring only marginally more storage than the eigenvector matrix itself.

9. Conclusions

Interpretation in geophysical inverse problems involves uncertainty estimation. In this manuscript, we have presented a scheme to efficiently estimate nonlinear inverse model uncertainty in any kind of inverse problem. The methodology is based on three concepts known in the scientific literature: model dimension reduction, parameter constraint mapping, and very sparse geometrical sampling. We have shown that the combination of these methods with forward simulation can reduce the nonlinear uncertainty problem to a deterministic sampling problem in only a few dimensions, requiring only limited forward solves, and resulting in an optimally sparse representation of the posterior model space. Estimated model covariances and parameter bounds constitute the prior information for this method. In this context, we present two manifestations of posterior sampling: exploitative and explorative, where the former searches for equivalence around a given model and the latter performs a kind of global search within parameter bounds. While forward solves are still required to evaluate the resulting model ensembles, we presented a scheme by which the computational burden is minimized by iteratively increasing grid level complexity until uncertainty measures converge. We also introduced the idea of geometric sampling that decouples the problem of sampling from the forward evaluation of the samples. Although this technique has been illustrated for a simple 1D EM inverse problem, it is general and has the potential to scale to large 3D parameterizations.

25

(a) (b)

(c) (d)

Figure 12: Example CSEM posterior model covariances using local parameter bounds and generated for various levels of sparsity. a) Level-2 sparsity (31 equi-probable samples). b)

Level-3 sparsity (101 equi-probable samples). c) Level-4 sparsity (294 equi-probable samples). d) Level-5 sparsity (809 equi-probable samples).

Figure 13: Convergence curve for covariance RMS error as defined by equation 16. Results

for posterior model covariance matrices using local parameter bounds and generated for various levels of sample grid sparsity. Grid level refers to the smaller of the sparse grids in the error calculation, e.g., the error at grid level 3 refers to the error of level 3 relative to level 4.

26

Acknowledgements

Authors Fernandez Martinez and Mukerji also acknowledge sponsors of Stanford Center for Reservoir Forecasting.

References

Alumbaugh, D.L., and G.A. Newman (2000), Image appraisal for 2-D and 3-D electromagnetic inversion: Geophysics, 65, 1455-1467. Alumbaugh, D.L. (2002), Linearized and nonlinear parameter variance estimation for two-dimensional electromagnetic induction inversion, Inverse Problems, 16, 1323-1341. Avis, D., and K. Fukuda (1992), A pivoting algorithm for convex hulls and vertex enumeration of arrangements and polyhedra, Journal Discrete Comp.

Geometry, 8, 295-313. Echeverria, D., T. Mukerji, and E.T.F. Santos (2009), Robust scheme for inversion of seismic and production data for reservoir facies modeling, in 72

nd Annual

International Meeting Expanded Abstracts, SEG, Tulsa, 28, 2432. Fernández-Álvarez, J.P., J.L. Fernández-Martínez, and C.O. Menéndez-Pérez (2008), Feasibility analysis of the use of binary genetic algorithms as importance samplers application to a geoelectrical VES inverse problem, Mathematical

Geosciences, 40, 375-408. Fernández Martínez, J.L., T. Mukerji, Z. Fernández Muñiz, E. García Gonzalo, M.J. Tompkins, and D.L. Alumbaugh (2009), Complexity reduction in inverse problems: wavelet transforms, DCT, PCA and geological bases, Eos

Trans. AGU, 90, Fall Meet. Suppl., Poster. Fukuda, K., and A. Prodon (1996), Double description method revisited, in Combinatorics and Computer Science-Lecture Notes in Computer Science, edited by M. Deza, R. Euler, and I. Manoussakis, 1120, 91–111, Springer- Verlag. Ganapathysubramanian, B. and N. Zabaras (2007), Modeling diffusion in random heterogeneous media: Data-driven models, stochastic collocation and the variational multiscale method, J. Comp. Physics, 226, 326-353. Haario, H., E. Saksman, and J. Tamminen (2001), An adaptive Metropolis algorithm, Bernoulli, 7, 223–242. Jolliffe, I.T. (2002), Principal Component Analysis, 2nd ed., Springer, New York. Malinverno, A. (2002), Parsimonious bayesian Markov chain Monte Carlo inversion in a nonlinear geophysical problem, Geophys. J. Int., 151, 675-688. Matarese, J.R. (1995), Nonlinear traveltime tomography, Ph.D. thesis, MIT, Cambridge. Meju, M.A., and V.R.S. Hutton (1992), Iterative most-squares inversion: application to magnetotelluric data, Geophys. J. Int., 108, 758-766. Meju, M.A. (1994), Geophysical data analysis: understanding inverse problem theory and practice, Course Notes, Society of Exploration Geophysicists, Tulsa, OK. Meju, M.A. (2009), Regularized extremal bounds analysis (REBA): an approach to quantifying uncertainty in nonlinear geophysical inverse problems, Geophys.

27

Res. Lett., 36, L03304. Menke, W. (1994), Geophysical Data Analysis: Discrete Inverse Theory, Academic Press, San Diego. Miranda, A.A., Y-A Le Borgne, and G. Bontempi (2008), New routes from minimal approximation error to principal components, Neural Proc. Lett., 27, 197- 207. Mosegaard K. and A. Tarantola (1995), Monte Carlo sampling of solutions to inverse problems, J. of Geophy. Res., 100, 12431–12447. Oldenburg, D.W. (1983), Funnel functions in linear and nonlinear appraisal, JGR, 88, 7387–7398. Pearson, K. (1901), On lines and planes of closest fit to systems of points in space, Phil. Mag., 2(6), 559–572. Sambridge, M. (1999), Geophysical inversion with a neighborhood algorithm—I. Searching a parameter space, Geophys. J Int., 138, 479-494. Scales, J. A., and L. Tenorio (2001), Prior information and uncertainty in inverse problems, Geophysics, 66, 389-397. Sen, M., and P.L. Stoffa (1995), Global Optimization Methods in Geophysical

Inversion, Elsevier Press, New York. Smolyak, S. (1963), Quadrature and interpolation formulas for tensor products of certain classes of functions, Doklady Mathematics, 4, 240–243. Tang, C.M. (1979), Electromagnetic fields due to dipole antennas embedded in stratified anisotropic media, IEEE Trans. Ant. and Prop., AP-27, 665-670. Tarantola, A., and B. Valette (1982), Inverse problems=quest for information, Journal of Geophysics, 50, 159-170. Tarantola, A. (2005), Inverse Problem Theory, SIAM Press, Philadelphia. Tompkins, M.J., and D.L. Alumbaugh (2002), A transversely isotropic 1D electromagnetic inversion scheme requiring minimal a priori information, in 72nd Annual International Meeting Expanded Abstracts, SEG, Tulsa, 676- 679. Tompkins, M.J (2003), Quantitative analysis of borehole electromagnetic induction logging responses using anisotropic forward modeling and inversion, PhD Thesis, University of Wisconsin-Madison, Madison. Xiu, D., and J.S. Hesthaven (2005), High-order collocation methods for differential

equations with random inputs, SIAM J. of Sci. Comp., 27, 1118-1139.

efficient nonlinear inverse uncertainty...

Documents