4d full –waveform metropolis hastings inversion using a...

4D Full –Waveform Metropolis Hastings Inversion Using a Local Acoustic SolverMaria Kotsi*, Memorial University of Newfoundland, Alison Malcolm, Memorial University of Newfoundland, andGregory Ely, Massachusetts Institute of Technology

SUMMARY

Time lapse seismic monitoring usually involves looking forsmall changes in localized regions. Quantifying the uncer-tainty related to these changes is important because it affectsexploration and production decisions. Traditional methods thatuse pixel by pixel quantification with large models are com-putationally infeasible. We use a local acoustic solver in thearea of interest, which allows for fast computation of the wave-field solves. This allows us to use a Metropolis Hastings algo-rithm in a Bayesian inversion to address the uncertainty that ispresent in the estimation of 4D velocity changes.

INTRODUCTION

Time-lapse (4D) seismic monitoring involves looking for smallchanges in localized regions. Accurate and high resolutionimaging of these changes is crucial for production and drillingdecisions. Full–Waveform Inversion (FWI) provides high res-olution seismic imaging based on prestack data. In FWI, weiteratively try to match modelled seismic waveforms with rawseismic waveforms collected in the field (Tarantola, 1984; Virieuxand Operto, 2009). FWI is extended to the 4D case via sev-eral schemes (Watanabe et al., 2005; Zheng et al., 2011; Ma-harramov and Biondi, 2014; Yang et al., 2014; Asnaashariet al., 2015). In this study we use only Double-Difference FWI(DDFWI). In DDFWI, to estimate a monitor velocity modelwe start from an inverted baseline model, and instead of themonitor data we use a set of composite data. The compositedata are the data difference (monitor - baseline) added to thesynthetic baseline data computed using the model recoveredfrom a baseline FWI. In this way, the only signal that is notexplained by the starting model in the composite data, is the4D change. However, all 4D-FWI methods inherently havethe same issue, — the non-convexity of the inverse problem—where multiple models can fit the data. The ambiguity ofwhether or not and to what extent we can trust our result canbe addressed with uncertainty quantification. To do so, weneed to understand what we know and what we don’t know;”there are known unknowns and unknown unknowns” (Ma,2010). In seismic imaging, there are several sources of un-certainty, varying from acquisition uncertainty to uncertain-ties in physical properties. Osypov et al. (2013) provide astrategy and overview for uncertainty quantification in seis-mic tomography that is related to oil and gas exploration andproduction. Poliannikov and Malcolm (2015) use a Bayesianframework to quantify uncertainty in migrated images consid-ering the effects of velocity model and picking errors. Elyet al. (2018) proposes a bayesian framework with a fast for-ward solver based on the field expansion method to addressuncertainties in seismic images. In (Kotsi and Malcolm, 2017)we studied model uncertainties in different 4D FWI schemes.

That work highlights the need for a more robust statistical in-version technique that will allow us more sophisticated sam-pling to estimate a monitor velocity model. In this study, wesetup a Metropolis Hastings algorithm that is appropriate forthe 4D case, and is similar in a way to the Double-Differencescheme. We use the Marmousi model (Versteeg, 1994) to per-form our numerical tests and we study the uncertainty that ispresent in the estimates of the 4D velocity change.

THEORY

In seismic imaging we are interested in finding the Earth modelthat best fits the data. In FWI terminology we use a leastsquares approach

m̂ = argminm{∑

xs

||F(m;xs)−d(xs)||22}, (1)

where m is the model, d are the observed data, F is the forwardwave solver, and xs is the source location. For the DDFWIextension, equation 1 becomes

m̂1 =argminm1

{∑

xs

||(F(m1;xs)−F(m0;xs)) (2)

+(d1(xs)−d0(xs))||22}.

where m0 is the inverted baseline model that is used as thestarting model for the monitor inversion, m1 is the monitormodel, d0 are the observed baseline data, and d1 are the ob-served monitor data.

Solving the forward problem in the entire domain while weare interested in a small subdomain (i.e. reservoir) is ineffi-cient. Willemsen et al. (2016) develop a frequency based localdomain solver that calculates wavefields identical to the onesproduced by a full domain solver, and apply it to a salt bound-ary inversion example. Malcolm and Willemsen (2017) suc-cessfully apply the local domain solver to Double DifferenceFWI on a Marmousi test case. There, they observe that thelocal domain solver improves the computational cost and alsouses fewer iterations in the inversion. In this study, we usethat local solver in order to efficiently compute our statisticaldistributions.

Bayes’ theorem describes the relationship between a hypoth-esis (H) and given evidence (E). In seismic imaging, this isexpressed in terms of velocity models (m) and observed data(d),

p(m|d) = p(d|m)p(m)

p(d), (3)

where p(m|d) is the quantity of interest for any probabilisticinversion, the probability that you obtain the model (m) giventhe data (d). p(m|d) is also called the posterior. p(d|m) is the

4D MCMC in local domain

0 2 4 6 8 10 12Horizontal coordinate (km)

0.00.51.01.52.02.53.03.5

Dep

th (k

m)

True marmousi model

1500

2275

3050

3825

4600

V (m

/s)

Figure 1: True baseline marmousi model, with the black boxindicating the area of the local domain.

0 2 4 6 8 10 12Horizontal coordinate (km)

0.00.51.01.52.02.53.03.5

Dep

th (k

m)

True time-lapse change

−100

−50

0

50

100

V (m

/s)

Figure 2: True time-lapse velocity change with a magnitude of75 m/s

likelihood function and it is calculated by (Tarantola, 2005)

L(m)≡ p(d|m) (4)

∝ exp[−1

2( f (m)−d)T

Σ−1( f (m)−d)

],

which involves the forward wave solver f (m). p(m) is themodel distribution that is used as an input information andp(d) is considered to be a normalization constant. For an ac-curate posterior calculation, a large number of samples need tobe generated, and then the first half are discarded in order toreduce the impact of the starting model (Brooks et al., 2011).Here, we use a Metropolis Hastings algorithm that doesn’t re-quire the computation of p(d), rather than uses the historyof the process to tune the proposal distribution (Haario et al.,2001). See (Ely et al., 2018) for a recent application of theAdaptive Metropolis Hastings algorithm in seismic imaging.Here we use the Metropolis Hastings algorithm to invert for 4Dmodel changes from 4D seismic data. The pseudocode in Al-gorithm 1 provides a brief explanation of the algorithm’s steps.The acceptance of the new proposal is assessed by the ratio ofthe likelihood functions of the proposal and current samples. Ifit is accepted, the proposal becomes the new current, and if itis rejected the current proposed model is reused. Since a largenumber of samples need to be evaluated, this means that a lotof forward wave solves needs to be evaluated for the compu-tation of the likelihood functions (equation 4). To make thiscomputationally feasible, we use the local domain solver.

Figure 3: Comparison of the the data residuals created fromone shot, at the frequency of 8.0 Hz in the full and local domain(top panel). The bottom panel is the difference between the thetwo data residuals from the top panel.

4D PROBLEM SETUP AND POSTERIOR CALCULA-TION

We now explain how we are going to setup our problem for theMetropolis Hastings algorithm. Let m be the velocity modeland n be zero mean Gaussian noise with a covariance matrixΣ, then

d = G(m)+n, (5)

where G is the forward modeling solver (i.e. the local domainsolver here). In the 4D case, we have d1 = G(m1)+n1 for thebaseline model, and d2 = G(m2)+ n2 for the monitor. If welet δm = m2−m1 and δd = d2−d1, then our goal is to findthe probability of the model difference given the two datasets,

p(δm|d1,d2). (6)

In this expression there is a hidden variable, m1. In order tocalculate the distribution over δm, we will first need to calcu-late the joint distribution p(m1,δm|d1,d2) and then integrateover m1 to get the distribution on only δm. This, however,would be computationally expensive since we have to samplefor both δm and m1, where m1 is the baseline velocity model.We cannot recover m1 from the local solver because it com-putes the velocity only within the local domain whereas thebaseline velocity model obviously exists throughout the fulldomain. Therefore, we will need an expression that is onlyin terms of δm. To do so, we rewrite the forward modelingexpression in equation 1 as,

δd = G(m2)+n2−G(m1)−n1

= G(m2)−G(m1)+(n2−n1)

= G(m1 +δm)−G(m1)+(n2−n1). (7)

The sum or difference of two zero mean Gaussians (n1,n2) isequal to a single Gaussian (n3) with a covariance Σ3 =Σ1+Σ2.From this, we can rewrite equation 7 as

δd = G(m1 +δm)−G(m1)+n3. (8)


If we let F(m1,δm) = G(m1 +δm)−G(m1), equation 8 be-comes

δd = F(m1,δm)+n3. (9)

This expression now is almost entirely in terms of δm. Ifwe now assume that F(m1,δm) is independent of the initialmodel m1, then at any model update

F(m1a,δm) = F(m1b,δm), (10)

where m1a ≈m1b. Then, the forward model is only dependenton the difference of models, difference of observed data, andthe sum of some known covariance matrices,

δd = F(δm)+n3. (11)

Bayes’ theorem from equation 3 now becomes

p(δm|δd) =p(δd|δm)p(δm)

p(δd), (12)

and the likelihood function from equation 4 is

L(δm)≡ p(δd|δm) ∝ (13)

exp[−1

2(F(δm)−δd)T

Σ3−1(F(δm)−δd)

]. (14)

At each iteration i, we get a new proposal δm∗ by adding azero mean perturbation to the current (δmi−1). We evaluatethe likelihood L(m∗) and if L(m∗) > L(mi−1) we accept thenew proposal and δmi = δm∗. If L(m∗) < L(mi−1) then wereject the proposal and δmi = δmi−1. In this study, we arechanging only the perturbation δm, therefore we have onlyone degree of freedom. We are also using a fixed step-size σ

throughout the process, and hence the covariance used to getthe new proposal is C = σ Id where d are the degrees of free-dom; here d = 1 hence C = σ . The pseudocode in Algorithm1 provides a summary of our methodology.

Algorithm 1 4D Metropolis Hastings algorithm

Require: δm0 . initial perturbationRequire: N . maximum number of iterationsRequire: C . covariance matrix

1: L(δm0)2: for i = 1, ...,N do3: n← Normal(0,C) . proposed jump4: δm∗← δmi−1 +n . proposed model perturbation5: L(m∗) . get the likelihood of the proposal6: αi =

L(m∗)L(mi−1

. acceptance probablility7: u←U [0,1]8: if u < αi then9: δmi← δm∗ . accept proposal

10: else11: δmi← δmi−1 . reject proposal12: end if13: end for

NUMERICAL EXAMPLE

To perform our numerical example we use the standard mar-mousi model (Versteeg, 1994) as the baseline true model (Fig-

ure 1). We create the monitor true model by adding a pertur-bation of 75 m/s to one of the layers (Figure 2). We generatesynthetic data at 8 Hz with a single shot and 651 equispacedreceivers. We validate the accuracy of the local solver by com-puting the residual δd in the full and local domain (Figure 3).To keep the problem simple we add noise to the true residualδd. We generate uncorrelated Gaussian noise in the frequencydomain, and we define the noise lever r as the energy of thenoise signal δdn relative to the noiseless signal δd,

r =∑

δdn2∑

δd2 . (15)

To assess the effect of noise in the convergence of the 4DMetropolis Hastings algorithm we compare two different noiselevels. In the first case we consider a relatively high noiselevel where r = 1.28 (Figure 4), and in the second case a verylow noise level of r = 0.01. In order to be able to use the

Figure 4: Comparison of the noisy signal with r = 1.28 withthe noiseless signal

local solver, we first require the Green’s functions computedon the background model in the full domain. Here, we areinterested in exploring the uncertainty related to the estima-tion of the 4D velocity change. Therefore, we keep everythingelse exact, and we compute the background Green’s functionson the true baseline model. Once these functions are avail-able, we can compute the wavefield inside the local domainfor any perturbations δm as many times as we want for verylittle computational cost. In our case here, for one shot at asingle frequency we can have a wave solve at approximately0.8 seconds. We start three different Markov chains with start-ing models δm0 = 0m/s, δm0 = 75m/s, and δm0 = 400m/s.In the first case we start from the baseline model, in the sec-ond case we start from the monitor model, and in the thirdcase we start with a perturbation far from the true. For eachchain, we run the Metropolis Hastings algorithm for 50,000 it-erations and we discard the first half. We do this for both noiselevels. Figure 5 and Figure 6 show the resulting histograms.In the case of r = 1.28 we see that all chains have convergedto a Gaussian distribution, however we notice that the chainshave not converged to the true values. To understand why thismight happen, we plot the likelihood values in terms of a rangeof perturbations (Figure 7). We see that the likelihood doesn’tchange significantly between perturbations of 40 and 90 m/s.This means that a range of values fit the data equally well, andhence the algorithm converges to these values. We see bothchains starting δm0 = 75m/s and δm0 = 75m/s converge tothe same range 80-100 m/s.


Figure 5: Histograms of the three Markov chains starting from a different initial guess. Here we plot only the second half of thechain, 25000, to reduce the dependency on the starting model. The red curve is the distribution that better fits the data (δm (m/s)values) and the green line indicates the true model. The noise level to all of them is r=1.28.

Figure 6: Histograms of the three Markov chains from Figure 5 with the noise level of r = 0.01. The red curve is the distributionthat better fits our data and the green line shows where the true model is.

However we need to study further why this is not the casewhen δm0 = 0m/s. When we repeat the same procedure forhalf a million iterations (Figure 8) we notice that one of thechains converges to values closer to the true, while the otherone seems to converge to similar ranges as before. Our futureefforts will be focused on understanding the source of this dis-crepancy and investigating the effects of noise into the startingmodel δm0. In the case when r = 0.01 there is a slightly differ-

Figure 7: Likelihood evolution in terms of different perturba-tions when noise is present r = 1.28.

ent pattern. All chains have converged to similar values, witha deviation of ±2m/s from the true value.

CONCLUSIONS

We propose a local acoustic solver for a fast 4D MetropolisHastings inversion. In our model, one wave solve takes ap-proximately 0.8 seconds in the local domain compared to ap-proximately 1 hour in the full domain. Calculating the fullposterior pixel by pixel would be impossible. We have createda framework to calculate the uncertainty in a targeted way that

is computationally feasible. We assessed the effect of noise inthe convergence of the algorithm, and we saw that in our case,noise results in a biased result. Our future efforts will be into amore comprehensive understanding of the convergence of themethod, and eventually be able to assess the impact of moresources of uncertainty and additional model parameters.

Figure 8: Resulted histograms from half a million runs of theMetropolis Hastings algorithm when r = 1.28, after discardingthe first half.

ACKNOWLEDGMENTS

This work is supported by Chevron and with grants from theNatural Sciences and Engineering Research Council of CanadaIndustrial Research Chair Program and the InnovateNL and bythe Hibernia Management and Development Corporation.


REFERENCES

Asnaashari, R., R. Brossier, S. Garambois, F. Audebert, P.Thore, and J. Virieux, 2015, Time-lapse seismic imag-ing using regularized full-waveform inversion with a priormodel: which strategy?: Geophysical Prospecting, 63, 78–98.

Brooks, S., A. Gelman, G. Jones, and X. L. Meng, 2011, Hand-book of markov chain monte carlo: CRC Press.

Ely, G., A. Malcolm, and O. V. Poloannikov, 2018, Assess-ing uncertainties in velocity models and images with a fastnonlinear uncertainty quantification method: Geophysics,83, R63–R75.

Haario, H., E. Saksman, and J. Tamminen, 2001, An adaptivemetropolis algorithm: Bernouli, 7, 223–242.

Kotsi, M., and A. Malcolm, 2017, A statistical comparison ofthree 4d full-waveform inversion schemes: SEG Interna-tional Exposition and 87th Annual Meeting, Expanded Ab-stracts, 1434–1438.

Ma, Y. Z., 2010, Uncertainty analysis in reservoir characteri-zation and management: How much should we know aboutwhat we don’t know?: In: Uncertainty analysis and reser-voir modeling. AAPGMemoir, 96, 1–15.

Maharramov, M., and B. Biondi, 2014, Joint full waveforminversion of time-lapse seismic data sets: SEG Denver An-nual Meeting, 1–5.

Malcolm, A., and B. Willemsen, 2017, Rapid 4d fwi using alocal solver: The Leading Edge, 35, 1053–1059.

Osypov, K., Y. Yang, A. Fournier, N. Ivanova, R. Bachrach,C. E. Yarman, Y. You, D. Nichols, and M. Woodward,2013, Model-uncertainty quantification in seismic tomog-raphy: method and applications: Geophysical Prospecting,61, 1114–1134.

Poliannikov, O. V., and A. E. Malcolm, 2015, The effect ofvelocity uncertainty on migrated reflectors: Improvementsfrom relative-depth imaging: Geophysics, 81, S21–S29.

Tarantola, A., 1984, Inversion of seismic reflection data in theacoustic approximation: Geophysics, 49, 1259–1266.

——–, 2005, Inverse problem theory and methods for modelparameter estimation: SIAM.

Versteeg, R., 1994, The marmousi experience: Velocity modeldetermination on a synthetic complex data set: The LeadingEdge, 13, 927–936.

Virieux, J., and S. Operto, 2009, An overview of full-waveform inversion in exploration geophysics: Geo-physics, 74, WCC1–WCC26.

Watanabe, T., S. Shimizu, E. Asakawa, and T. Matsuoka, 2005,Differential waveform tomography for time-lapse crosswellseismic data with application to gas hydrate productionmonitoring: SEG Technical Program Expanded Abstracts.

Willemsen, B., A. Malcolm, and W. Lewis, 2016, A numeri-cally exact local solver applied to salt boundary inversionin seismic full waveform inversion: Geophysical JournalInternational, 204, 1703–1720.

Yang, D., A. Malcolm, and M. Fehler, 2014, Time-lapse fullwaveform inversion and uncertainty analysis with differentsurvey geometries: 76th EAGE Conference and Exhibition.

Zheng, Y., P. Barton, and S. Singh, 2011, Strategies for elasticfull waveforms inversion of time-lapse ocean bottom cable

(obc) seismic data: SEG San Antonion Annual Meeting,4195–4200.

4d full –waveform metropolis hastings inversion using a...

Documents