calibration of oil reservoir simulation codes - schoolsmaurizio/talks/csml_ucl11.pdf · calibration...
TRANSCRIPT
Calibration of Oil Reservoir Simulation Codes
Maurizio Filippone
Department of Statistical ScienceUniversity College London
February 8th, 2011
What is the meaning of calibration?
Given a model of a physical system, calibration means:• frequentist: estimation of model parameters• Bayesian: inference of model parameters
Motivating application
What is the problem?• maximizing economic recovery of hydrocarbons from
subsurface reservoirsWhat’s available?• models of the reservoirs• geophysical data gathered on site
Example - Umiat oil field
Oil reservoir simulatorEclipse oil reservoir simulation software by Schlumberger
*
Oil reservoir simulator
Reservoir simulator principles:• conservation of energy and mass• isothermal fluid phase behavior• Darcy approximation (fluid flow through porous media):
L
A P2 P1
Q Q = −kAµ
(P2 − P1)
L
• k permeability, µ viscosity• Q discharge rate (m3/s)
The problem - Oil reservoir model• 2D 100× 12 grid blocks (Tavassoli et al. 2004)• Three parameters:
• poor quality sand permeability (klow)• good quality sand permeability (khigh)• discontinuity (throw)
INJECTOR PRODUCER
-8320
-8360
-8380
-8340
OIL SATURATION
0 500 1000
Dep
th, f
t
0.72508 0.900130.19993 0.37498 0.55003
The problem - Oil reservoir model
●
●
●● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●●
●●
0 5 10 15 20 25 30 35
010
020
030
040
050
0
Observed data − oil/water rates
months
bbl/d
ay
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
●
●●
●
●
OilWater
• running time of the 2D oil reservoir model in Tavassoli et al.(2004): couple of seconds
• more complex and realistic 3D models - hours/days to run
Calibration of simulators
Physical system
Simulator
x z
y
θ
Ω
• We aim at inferring model parameters to quantifyuncertainty in predictions
Relevance of the problem
• Calibration of simulators has application in very manyapplication fields, e.g.:
• high energy physics (Higdon et al. 2005)• geophysics (Cui et al. 2009)• astrophysics (Kaufman et al. 2010)• industrial processes (Forrester 2010)• ecology (Schneider et al. 2006)• climatology (Guillas et al. 2004, Wilkinson 2001)• systems biology (Wilkinson 2010)
• quantifying uncertainty is of paramount importance forbalancing risks/costs of decisions
Importance of quantifying uncertainty
• this is what we might want• inferring model parameters• obtaining predictive distributions (balance cost of decisions)• Other desirables:
• including prior information• approaching sequential estimation• doing model selection
• Bayesian framework seems to be appropriate
Importance of quantifying uncertainty
• this is what we might want• inferring model parameters• obtaining predictive distributions (balance cost of decisions)• Other desirables:
• including prior information• approaching sequential estimation• doing model selection
• Bayesian framework seems to be appropriate
Bayesian calibration of simulators
• Kennedy at al. (2001):
Physical system
Simulator
x z = ρ η(x, θ) + δ(x) + ε
y = η(x, θ)
θ
• η(x,θ) and δ(x) independent and modeled using GPs• ε ∼ N (0, σ2
ε ) and i.i.d.
Interpretability of calibration results
• Can we really interpret the inferred simulator parametersas the “real” ones?
• if the model differs from the physical system, then theinterpretation of θ is questionable
• Model misspecification and consequences in Bayesianinference - see e.g., White (1982) and Muller (2009)
• Loeppky et al. (2006) studied the problem of modelmisspecification in calibration problems
Simplified calibration model - Likelihood
Physical system
Simulator
x z = η(x, θ) + ε
y = η(x, θ)
θ
• we have direct access to the log-likelihood:
log[p(Data|θ)] =∑
i
log[N (zi |η(xi ,θi), σ
2ε )]
• if we place priors π(θ) we can infer θ - simulation usingMarkov chain Monte Carlo (MCMC)
p(θ|Data) ∝ p(Data|θ)π(θ)
Posterior inference - Multimodalities
• Given that the simulator is usually modeling a complexphysical system, the likelihood is often multimodal
Oil reservoir data• As a working example let us consider the 2D oil reservoir
model by Tavassoli et al. (2004)• “Real” data are generated from the simulator:
zi = η(xi , θ) + εi
where θ represents a set of “true” parameters
●
●
●● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●●
●●
0 5 10 15 20 25 30 35
010
020
030
040
050
0
Observed data − oil/water rates
months
bbl/d
ay
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
●
●●
●
●
OilWater
Inference using Metropolis-Hastings
2500 3000 3500 4000 4500 5000
010
2030
4050
60
Metropolis−Hastings
iteration
thro
w
2500 3000 3500 4000 4500 5000
100
120
140
160
180
200
Metropolis−Hastings
iteration
khig
h
2500 3000 3500 4000 4500 5000
02
46
810
12
Metropolis−Hastings
iteration
klow
• Poor exploration of the parameter space• Proper exploration of the parameter space via population
based Markov chain Monte Carlo (Pop-MCMC)
Inference using Pop-MCMC
• bridge from the prior to the posterior viatempering
tempered posterior ∝ p(Data|θ)tπ(θ)
with t ∈ [0,1]
• one sampler for each tempered posterior• samplers sampling independently and
exchanging samples so that invariance ofp(θ|Data) is preserved
Inference using Pop-MCMC
2500 3000 3500 4000 4500 5000
010
2030
4050
60
Metropolis−Hastings
iteration
thro
w
2500 3000 3500 4000 4500 500010
012
014
016
018
020
0
Metropolis−Hastings
iteration
khig
h2500 3000 3500 4000 4500 5000
02
46
810
12
Metropolis−Hastings
iteration
klow
2500 3000 3500 4000 4500 5000
010
2030
4050
60
Population MCMC
iteration
thro
w
2500 3000 3500 4000 4500 5000
100
120
140
160
180
200
Population MCMC
iteration
khig
h
2500 3000 3500 4000 4500 50000
24
68
1012
Population MCMC
iteration
klow
Inference using Pop-MCMC - Results
Interpretation of the multimodal posterior - throw parameter
throw
coun
ts
0 10 20 30 40 50 60
010
2030
Inference using Pop-MCMC - Results
throw
coun
ts
0 10 20 30 40 50 60
010
2030
khigh
coun
ts100 120 140 160 180 200
05
1015
2025
3035
klow
coun
ts
0 10 20 30 40 50
020
4060
80
●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●●
● ●
0 20 40 60 80 100 120
010
020
030
040
050
0
Oil production rate
months
bbl/d
ay
● RealMean prediction5th − 95th percentiles
Inference using Pop-MCMC - Results
throw
coun
ts
0 10 20 30 40 50 60
010
2030
khigh
coun
ts100 120 140 160 180 200
05
1015
2025
3035
klow
coun
ts
0 10 20 30 40 50
020
4060
80
●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●●
● ●
●
0 20 40 60 80 100 120
010
020
030
040
050
0
Oil production rate
months
bbl/d
ay
● RealMean prediction5th − 95th percentiles
Inference using Pop-MCMC - Results
throw
coun
ts
0 10 20 30 40 50 60
010
2030
khigh
coun
ts100 120 140 160 180 200
05
1015
2025
3035
klow
coun
ts
0 10 20 30 40 50
020
4060
80
●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●●
● ●
●
●
●
●●
● ●
0 20 40 60 80 100 120
010
020
030
040
050
0
Oil production rate
months
bbl/d
ay
● RealMean prediction5th − 95th percentiles
What if the running time of the simulator isprohibitive?
• Simulators could be so complex to require hours or evendays to run
• Fully Bayesian treatment is not viable• We aim at using the available computational resources to
find parameters and an estimate of the uncertainty (notBayesian)
Need for emulators• Emulators as a proxy for the expensive likelihood• Start from a set of design points (latin hypercubes)• Emulator using Gaussian Process
0 2 4 6 8 10
−5
05
10
θ
nega
tive
log−
likel
ihoo
d
●
●
●
●●●
●
●
●
●
TrueFitted5th−95th perc
Need for emulators
• Marginals of the emulator are Gaussian:
y(θ) ' N (µ(θ), s2(θ))
• Incremental design for minimization optimizing a utility(improvement) function (Jones et al. 1998)
I(θ) = max(fmin − y(θ),0)
expected value:
E[I(θ)] = (fmin−µ(θ))Φ
(fmin − µ(θ)
s(θ)
)+s(θ)N
(fmin − µ(θ)
s(θ)
)
Example of incremental design
0 2 4 6 8 10
−5
05
10
θ
nega
tive
log−
likel
ihoo
d
●●
●
●●●
TrueFitted5th−95th perc
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
θ
EI
Example of incremental design
0 2 4 6 8 10
−5
05
10
θ
nega
tive
log−
likel
ihoo
d
●●
●
●●●
●
TrueFitted5th−95th perc
0 2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
θ
EI
Example of incremental design
0 2 4 6 8 10
−5
05
10
θ
nega
tive
log−
likel
ihoo
d
●●
●
●●●
●
●
TrueFitted5th−95th perc
0 2 4 6 8 10
0.00
0.02
0.04
0.06
θ
EI
Example of incremental design
0 2 4 6 8 10
−5
05
10
θ
nega
tive
log−
likel
ihoo
d
●●
●
●●●
●
●
●
TrueFitted5th−95th perc
0 2 4 6 8 10
0.00
0.04
0.08
θ
EI
Example of incremental design
0 2 4 6 8 10
−5
05
10
θ
nega
tive
log−
likel
ihoo
d
●●
●
●●●
●
●
●
●
TrueFitted5th−95th perc
0 2 4 6 8 10
0.00
00.
001
0.00
20.
003
0.00
4
θ
EI
Example of incremental design
0 2 4 6 8 10
−5
05
10
θ
nega
tive
log−
likel
ihoo
d
●●
●
●●●
●
●
●
●
●
TrueFitted5th−95th perc
0 2 4 6 8 10
0.00
00.
002
0.00
4
θ
EI
Example of incremental design
0 2 4 6 8 10
−5
05
10
θ
nega
tive
log−
likel
ihoo
d
●●
●
●●●
●
●
●
●
●
●
TrueFitted5th−95th perc
0 2 4 6 8 10
0.00
000.
0004
0.00
080.
0012
θ
EI
Example of incremental design
0 2 4 6 8 10
−5
05
10
θ
nega
tive
log−
likel
ihoo
d
●●
●
●●●
●
●
●
●
●
●
●
TrueFitted5th−95th perc
0 2 4 6 8 10
0e+
002e
−04
4e−
04
θ
EI
Example of incremental design
0 2 4 6 8 10
−5
05
10
θ
nega
tive
log−
likel
ihoo
d
●●
●
●●●
●
●
●
●
●
●
●●
TrueFitted5th−95th perc
0 2 4 6 8 10
0.00
000
0.00
010
0.00
020
θ
EI
Incremental design - Oil reservoir - Results
• incremental design on the oil data• we assumed that computational resources allowed only
100 runs of the simulator• 50 simulations using latin hypercubes• 50 simulations using incremental design
Incremental design - Oil reservoir - Results
true values: 10.4, 131.6, 1.3
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●
●●●●
●
●
●●
● ● ●
0 20 40 60 80 100 120
020
040
060
080
0
Samples from the prior
months
bbl/d
ay ● RealMean prediction5th − 95th percentiles
Incremental design - Oil reservoir - Results
true values: 10.4, 131.6, 1.3
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●
●●●●
●
●
●●
● ● ●
0 20 40 60 80 100 120
020
040
060
080
0
10.2( 11.9) 123.3( 11.4) 21.9( 39.4)
months
bbl/d
ay ● RealMean prediction5th − 95th percentiles
Incremental design - Oil reservoir - Results
true values: 10.4, 131.6, 1.3
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●
●●●●
●
●
●●
● ● ●
0 20 40 60 80 100 120
020
040
060
080
0
46.76( 37.47) 122.07( 3.37) 19.85( 8.65)
months
bbl/d
ay ● RealMean prediction5th − 95th percentiles
Conclusions and ongoing work
• This work is part of the ongoing research program of theComputational Statistics group at UCL on inference incomplex systems
• Calibration of simulators is an important and promisingresearch area
• It is a challenging problem even with simplifyingassumptions:
• inference over parameters of complex simulators• expensive to evaluate the likelihood
Conclusions and ongoing work
Ongoing research:• Composite likelihoods• Emulators for moderate/large sized data sets• Consistency of estimators in incremental experiment
design?• Hybrid/Manifold Monte Carlo with emulator as potential
field• Design of covariance functions for emulators of differential
equations based simulators• Model selection
Acknowledgements[1] L. Mohamed, B. Calderhead, M.Filippone, M. Christie, M. Girolami.
Population MCMC methods for history matching and uncertainty quantification.
Computational Geosciences. to appear
Collaborators:• Mike Christie, Heriot Watt University, UK• Derek Bingham, Simon Fraser University, Canada• John Skilling, Maximum Entropy Data Consultants Ltd, UK
This research is funded by the Engineering and Physical Sciences Research Council
(EPSRC), grant number EP/E052029/1.