-
Inference for complex models
Richard [email protected]
School of Maths and StatisticsUniversity of Sheffield
23 April 2015
-
Computer experiments
Rohrlich (1991): Computer simulation is
‘a key milestone somewhat comparable to the milestone thatstarted the empirical approach (Galileo) and the deterministicmathematical approach to dynamics (Newton and Laplace)’
Challenges for statistics:How do we make inferences about the world from a simulation of it?
how do we relate simulators to reality?
how do we estimate tunable parameters?
how do we deal with computational constraints?
how do we make uncertainty statements about the world thatcombine models, data and their corresponding errors?
There is an inherent a lack of quantitative information on the uncertaintysurrounding a simulation - unlike in physical experiments.
-
Computer experiments
Rohrlich (1991): Computer simulation is
‘a key milestone somewhat comparable to the milestone thatstarted the empirical approach (Galileo) and the deterministicmathematical approach to dynamics (Newton and Laplace)’
Challenges for statistics:How do we make inferences about the world from a simulation of it?
how do we relate simulators to reality?
how do we estimate tunable parameters?
how do we deal with computational constraints?
how do we make uncertainty statements about the world thatcombine models, data and their corresponding errors?
There is an inherent a lack of quantitative information on the uncertaintysurrounding a simulation - unlike in physical experiments.
-
Bayesian statisticsRepresent all uncertainties as probability distributions:
π(θ|D) = π(D|θ)π(θ)π(D)
π(θ|D) is the posterior distributionI Always hard to compute:
SMC2, PGAS, Tempered NUTS-HMC
π(D|θ) is the likelihood function.I For complex models can be slow to compute:
GP emulators
I Can also be impossible to compute in some cases:
ABC
I
π(D|θ) =∫π(D|X )π(X |θ)dX
Relating simulator to reality can make specifying π(D|θ) particularlydifficult:
Simlator discrepancy modelling
π(D) is the model evidence or normalising constant.I Requires us to integrate, and is thus harder to compute than π(θ|D):
SMC2, nested sampling
-
Bayesian statisticsRepresent all uncertainties as probability distributions:
π(θ|D) = π(D|θ)π(θ)π(D)
π(θ|D) is the posterior distributionI Always hard to compute: SMC2, PGAS, Tempered NUTS-HMC
π(D|θ) is the likelihood function.I For complex models can be slow to compute:
GP emulators
I Can also be impossible to compute in some cases:
ABC
I
π(D|θ) =∫π(D|X )π(X |θ)dX
Relating simulator to reality can make specifying π(D|θ) particularlydifficult:
Simlator discrepancy modelling
π(D) is the model evidence or normalising constant.I Requires us to integrate, and is thus harder to compute than π(θ|D):
SMC2, nested sampling
-
Bayesian statisticsRepresent all uncertainties as probability distributions:
π(θ|D) = π(D|θ)π(θ)π(D)
π(θ|D) is the posterior distributionI Always hard to compute: SMC2, PGAS, Tempered NUTS-HMC
π(D|θ) is the likelihood function.I For complex models can be slow to compute: GP emulatorsI Can also be impossible to compute in some cases: ABCI
π(D|θ) =∫π(D|X )π(X |θ)dX
Relating simulator to reality can make specifying π(D|θ) particularlydifficult: Simlator discrepancy modelling
π(D) is the model evidence or normalising constant.I Requires us to integrate, and is thus harder to compute than π(θ|D):
SMC2, nested sampling
-
Bayesian statisticsRepresent all uncertainties as probability distributions:
π(θ|D) = π(D|θ)π(θ)π(D)
π(θ|D) is the posterior distributionI Always hard to compute: SMC2, PGAS, Tempered NUTS-HMC
π(D|θ) is the likelihood function.I For complex models can be slow to compute: GP emulatorsI Can also be impossible to compute in some cases: ABCI
π(D|θ) =∫π(D|X )π(X |θ)dX
Relating simulator to reality can make specifying π(D|θ) particularlydifficult: Simlator discrepancy modelling
π(D) is the model evidence or normalising constant.I Requires us to integrate, and is thus harder to compute than π(θ|D):
SMC2, nested sampling
-
Uncertainty Quantification (UQ) for computer experiments
CalibrationI Estimate unknown parameters θI Usually via the posterior distribution π(θ|D)I Or history matching
Uncertainty analysisI f (x) a complex simulator. If we are uncertain about x , e.g.,
X ∼ π(x), what is π(f (X ))?Sensitivity analysis
I X = (X1, . . . ,Xd)>. Can we decompose Var(f (X )) into contributions
from each Var(Xi )?I If we can improve our knowledge of any Xi , which should we choose to
minimise Var(f (X ))?Simulator discrepancy
I f (x) is imperfect. How can we quantify or correct simulatordiscrepancy.
Data assimilationI Find π(x1:t |y1:t)
-
Uncertainty Quantification (UQ) for computer experiments
CalibrationI Estimate unknown parameters θI Usually via the posterior distribution π(θ|D)I Or history matching
Uncertainty analysisI f (x) a complex simulator. If we are uncertain about x , e.g.,
X ∼ π(x), what is π(f (X ))?
Sensitivity analysisI X = (X1, . . . ,Xd)
>. Can we decompose Var(f (X )) into contributionsfrom each Var(Xi )?
I If we can improve our knowledge of any Xi , which should we choose tominimise Var(f (X ))?
Simulator discrepancyI f (x) is imperfect. How can we quantify or correct simulator
discrepancy.
Data assimilationI Find π(x1:t |y1:t)
-
Uncertainty Quantification (UQ) for computer experiments
CalibrationI Estimate unknown parameters θI Usually via the posterior distribution π(θ|D)I Or history matching
Uncertainty analysisI f (x) a complex simulator. If we are uncertain about x , e.g.,
X ∼ π(x), what is π(f (X ))?Sensitivity analysis
I X = (X1, . . . ,Xd)>. Can we decompose Var(f (X )) into contributions
from each Var(Xi )?I If we can improve our knowledge of any Xi , which should we choose to
minimise Var(f (X ))?
Simulator discrepancyI f (x) is imperfect. How can we quantify or correct simulator
discrepancy.
Data assimilationI Find π(x1:t |y1:t)
-
Uncertainty Quantification (UQ) for computer experiments
CalibrationI Estimate unknown parameters θI Usually via the posterior distribution π(θ|D)I Or history matching
Uncertainty analysisI f (x) a complex simulator. If we are uncertain about x , e.g.,
X ∼ π(x), what is π(f (X ))?Sensitivity analysis
I X = (X1, . . . ,Xd)>. Can we decompose Var(f (X )) into contributions
from each Var(Xi )?I If we can improve our knowledge of any Xi , which should we choose to
minimise Var(f (X ))?Simulator discrepancy
I f (x) is imperfect. How can we quantify or correct simulatordiscrepancy.
Data assimilationI Find π(x1:t |y1:t)
-
Uncertainty Quantification (UQ) for computer experiments
CalibrationI Estimate unknown parameters θI Usually via the posterior distribution π(θ|D)I Or history matching
Uncertainty analysisI f (x) a complex simulator. If we are uncertain about x , e.g.,
X ∼ π(x), what is π(f (X ))?Sensitivity analysis
I X = (X1, . . . ,Xd)>. Can we decompose Var(f (X )) into contributions
from each Var(Xi )?I If we can improve our knowledge of any Xi , which should we choose to
minimise Var(f (X ))?Simulator discrepancy
I f (x) is imperfect. How can we quantify or correct simulatordiscrepancy.
Data assimilationI Find π(x1:t |y1:t)
-
Meta-modellingSurrogate modelling
Emulation
-
Code uncertainty
For complex simulators, run times might be long, ruling out brute-forceapproaches such as Monte Carlo methods.
Consequently, we will only know the simulator output at a finite numberof points.
We call this code uncertainty.
All inference must be done using a finite ensemble of model runs
Dsim = {(θi , f (θi ))}i=1,...,N
If θ is not in the ensemble, then we are uncertainty about the valueof f (θ).
-
Code uncertainty
For complex simulators, run times might be long, ruling out brute-forceapproaches such as Monte Carlo methods.
Consequently, we will only know the simulator output at a finite numberof points.
We call this code uncertainty.
All inference must be done using a finite ensemble of model runs
Dsim = {(θi , f (θi ))}i=1,...,N
If θ is not in the ensemble, then we are uncertainty about the valueof f (θ).
-
Meta-modelling
Idea: If the simulator is expensive, build a cheap model of it and use thisin any analysis.
‘a model of the model’
We call this meta-model an emulator of our simulator.
Gaussian process emulators are most popular choice for emulator.
Built using an ensemble of model runs Dsim = {(θi , f (θi ))}i=1,...,NThey give an assessment of their prediction accuracy π(f (θ)|Dsim)
-
Meta-modelling
Idea: If the simulator is expensive, build a cheap model of it and use thisin any analysis.
‘a model of the model’
We call this meta-model an emulator of our simulator.
Gaussian process emulators are most popular choice for emulator.
Built using an ensemble of model runs Dsim = {(θi , f (θi ))}i=1,...,NThey give an assessment of their prediction accuracy π(f (θ)|Dsim)
-
Meta-modellingGaussian Process Emulators
Gaussian processes provide a flexible nonparametric distributions for ourprior beliefs about the functional form of the simulator:
f (·) ∼ GP(m(·), σ2c(·, ·))
where m(·) is the prior mean function, and c(·, ·) is the prior covariancefunction (semi-definite).Gaussian processes are invariant under Bayesian updating.
Definition If f (·) ∼ GP(m(·), c(·, ·)) then for any collection of inputsx1, . . . , xn the vector
(f (x1), . . . , f (xn))T ∼ MVN(m(x), σ2Σ)
where Σij = c(xi , xj).
-
Meta-modellingGaussian Process Emulators
Gaussian processes provide a flexible nonparametric distributions for ourprior beliefs about the functional form of the simulator:
f (·) ∼ GP(m(·), σ2c(·, ·))
where m(·) is the prior mean function, and c(·, ·) is the prior covariancefunction (semi-definite).Gaussian processes are invariant under Bayesian updating.
Definition If f (·) ∼ GP(m(·), c(·, ·)) then for any collection of inputsx1, . . . , xn the vector
(f (x1), . . . , f (xn))T ∼ MVN(m(x), σ2Σ)
where Σij = c(xi , xj).
-
Gaussian Process IllustrationZero mean
-
Gaussian Process Illustration
-
Gaussian Process Illustration
-
Challenges
Design: if we can afford n simulator runs, which parameters shouldwe run it at?
High dimensional inputsI If θ is multidimensional, then even short run times can rule out brute
force approaches
High dimensional outputsI Spatio-temporal.
Incorporating physical knowledge
Difficult behaviour, e.g., switches, step-functions, non-stationarity...
-
Uncertainty quantification for Carbon Capture and StorageEPSRC: transport
10-4Volume [m^3/mol]
6
8
10
Pres
sure
[MPa
]
310K307K304.2K (Tc)296K290KCritical pointCoexisting vapourCo-existing liquid
Technical challenges:
How do we find non-parametric Gaussian process models that i) obeythe fugacity constraints ii) have the correct asymptotic behaviourHow do we fit parametric equations of state (Peng-Robinson andvariants) - tempered NUTS-HMC.
-
Storage
Knowledge of the physical problem is encoded in a simulator f
Inputs:Permeability field, K(2d field) y
f (K )yOutputs:
Stream func. (2d field),concentration (2d field),surface flux (1d scalar),...
0 5 10 15 20 25 30 35 40 45 500
5
10
15
20
25
30
35
40
45
50
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
010
2030
4050
0
10
20
30
40
500.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
↓ f (K )True truncated streamfield
5 10 15 20 25 30 35 40 45 50
5
10
15
20
25
30
35
40
45
50
−0.02
−0.015
−0.01
−0.005
0
0.005
0.01True truncated concfield
5 10 15 20 25 30 35 40 45 50
5
10
15
20
25
30
35
40
45
50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Surface Flux= 6.43, . . .
-
CCS examplesLeft=true, right = emulated, 118 training runs, held out test set.
True streamfield
5 10 15 20 25 30 35 40 45 50
5
10
15
20
25
30
35
40
45
50
−0.04
−0.035
−0.03
−0.025
−0.02
−0.015
−0.01
−0.005
0Emulated streamfield
5 10 15 20 25 30 35 40 45 50
5
10
15
20
25
30
35
40
45
50
−0.04
−0.035
−0.03
−0.025
−0.02
−0.015
−0.01
−0.005
0
True concfield
5 10 15 20 25 30 35 40 45 50
5
10
15
20
25
30
35
40
45
50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Emulated concfield
5 10 15 20 25 30 35 40 45 50
5
10
15
20
25
30
35
40
45
50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-
ABC: inference for complexstochastic models
-
Estimating Divergence Times
-
Forward simulation
Model evolution and fossil finds
Let τ be the temporal gap between the divergence time and theoldest fossil.
The posterior for τ is then used as a prior for a genetic analysis.
The likelihood function π(D|θ) is intractable, but it is cheap to simulate.
-
Approximate Bayesian Computation (ABC)Wilkinson 2008/2013, Wilkinson and Tavaré 2009
If the likelihood function is intractable, then ABC is one of the fewapproaches we can use to do inference.
Uniform Rejection Algorithm
Draw θ from π(θ)
Simulate X ∼ f (θ)Accept θ if ρ(D,X ) ≤ �
� reflects the tension between computability and accuracy.
As �→∞, we get observations from the prior, π(θ).If � = 0, we generate observations from π(θ | D).
ABC does not require explicit knowledge of the likelihood function
-
Approximate Bayesian Computation (ABC)Wilkinson 2008/2013, Wilkinson and Tavaré 2009
If the likelihood function is intractable, then ABC is one of the fewapproaches we can use to do inference.
Uniform Rejection Algorithm
Draw θ from π(θ)
Simulate X ∼ f (θ)Accept θ if ρ(D,X ) ≤ �
� reflects the tension between computability and accuracy.
As �→∞, we get observations from the prior, π(θ).If � = 0, we generate observations from π(θ | D).
ABC does not require explicit knowledge of the likelihood function
-
� = 10
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−10
010
20
theta vs D
theta
D
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●● ●
●●
●
●
●
●
●
●
● ● ●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●● ●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●●●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●● ●
●
● ●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
− ε
+ ε
D
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Density
theta
Den
sity
ABCTrue
θ ∼ U[−10, 10], X ∼ N(2(θ + 2)θ(θ − 2), 0.1 + θ2)
ρ(D,X ) = |D − X |, D = 2
-
� = 7.5
●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●● ●
●
●
● ●●
●
●
●
●
●
●
● ●
●
● ●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●●●
●
●
●●
●
●
●●●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−10
010
20
theta vs D
theta
D
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●●● ●
●●
●
●
●
●
●
●
● ● ●●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●● ●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●●●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
● ●
●●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●● ●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●●
●
●
●●
●●
●●
●● ●
● ●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
− ε
+ ε
D
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Density
theta
Den
sity
ABCTrue
-
� = 5
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
● ●●
●
●
●
●
●● ●
● ●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
● ●●●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●●
●
●
● ●
● ●
●
●
●●
●
●●● ●
●●
●
●
●
●
●
●
●●
●
●●●
●
●
●
● ●●
●
●
●
●
●●
●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
−3 −2 −1 0 1 2 3
−10
010
20
theta vs D
theta
D
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
●
●
●●● ●
●●
●
● ●
●
● ● ●●
●
●●
●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●●●
●●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●● ●
●●
●
●
●
●●
●● ●
●
●●
●●●●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●●
●
●
●
● ●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
● ●● ●
●
●●
●●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●●●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●●
●
●● ●
●
●●
●●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
− ε
+ ε
D
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Density
theta
Den
sity
ABCTrue
-
� = 2.5
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●● ●
●
●
●●
●
●
●
●
●
●
●
●
●
● ● ●●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
● ●●
●
●
●●
●●
●
●
●
●
●
●
●
●
● ●
●●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●
●
●
● ●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●●●
●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
−3 −2 −1 0 1 2 3
−10
010
20
theta vs D
theta
D
●
●●
●
●●
●
●
●
●
●●
●●●●
●
● ●●
●
●●
●●
●●
●●●●
●
●●●
● ●●
●●
●●
●
● ●
●
●
●●
●
●●●
●
●
●●
●
●●
●●
●●
●
●●
●●●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●● ●
●
●
●
●
●
●●●
●
●●
●
●
●●
●●●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●●
●●
●
●
●●
●
●
●
●●●
●
●●●
●
●
●●●
● ● ●
●
●●
●●●
●●
●●
●●
●●
●
●●● ●●
●
●
●
●●
●●
●
●
●
●
●●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ●
●
●●
●
●●
●● ●
●●
●●
●
●●●
●
●●
●●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
●●
●
●● ●
●
●●
●
●
− ε
+ ε
D
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Density
theta
Den
sity
ABCTrue
-
� = 1
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●
● ●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
−3 −2 −1 0 1 2 3
−10
010
20
theta vs D
theta
D ●●● ●●●● ●●●●●●● ●
●●
●●● ●●
● ●
●●
●●
●●
●●
●●●●●
●
●●●●●
●
●●●
●●●
● ●●●●
● ● ●●●●
●
●●
●●
●●●●●
●●
●●
●●
●● ●● ●
● ●● ●●●
●
● ●● ● ●
●●● ●●
− ε
+ ε
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Density
theta
Den
sity
ABCTrue
-
Rejection ABCIf the data are too high dimensional we never observe simulations that are‘close’ to the field data - curse of dimensionalityReduce the dimension using summary statistics, S(D).
Approximate Rejection Algorithm With Summaries
Draw θ from π(θ)
Simulate X ∼ f (θ)Accept θ if ρ(S(D), S(X )) < �
If S is sufficient this is equivalent to the previous algorithm.
Simple → Popular with non-statisticians
∃ many extensions and improvementsHow to choose S(D)
How to efficiently sample θ
-
Rejection ABCIf the data are too high dimensional we never observe simulations that are‘close’ to the field data - curse of dimensionalityReduce the dimension using summary statistics, S(D).
Approximate Rejection Algorithm With Summaries
Draw θ from π(θ)
Simulate X ∼ f (θ)Accept θ if ρ(S(D), S(X )) < �
If S is sufficient this is equivalent to the previous algorithm.
Simple → Popular with non-statisticians
∃ many extensions and improvementsHow to choose S(D)
How to efficiently sample θ
-
An integrated molecular and palaeontological analysis
The fossil record does not constrain the primate divergence time asclosely as previously believed.
Genetic and palaeontology estimates unifiedHuman-chimp divergence time pushed further back.
Wilkinson et al. 2011, Bracken-Grissom et al. 2014.
-
Accelerating ABC: GP-ABCMonte Carlo methods (such as ABC) are costly and can require moresimulation than is possible. However,
most methods sample naively - they don’t learn from previoussimulations
they don’t exploit known properties of the likelihood function, suchas continuity
they sample randomly, rather than using careful design.
Emulators are the usual approach to dealing with complex models. But,emulating stochastic simulators is problematic.
Instead of modelling the simulator output, we can instead modelL(θ) = π(D|θ)
D remains fixed: we only need learn L as a function of θ
1d response surface
But, it can be hard to model.
-
Accelerating ABC: GP-ABCMonte Carlo methods (such as ABC) are costly and can require moresimulation than is possible. However,
most methods sample naively - they don’t learn from previoussimulations
they don’t exploit known properties of the likelihood function, suchas continuity
they sample randomly, rather than using careful design.
Emulators are the usual approach to dealing with complex models. But,emulating stochastic simulators is problematic.
Instead of modelling the simulator output, we can instead modelL(θ) = π(D|θ)
D remains fixed: we only need learn L as a function of θ
1d response surface
But, it can be hard to model.
-
Iteration 24Left=estimate, right = truth
http://youtu.be/FF3KhKh6NHg
-
Climate scienceWhat drives the glacial-interglacial cycle?
Eccentricity: orbital departure from a circle, controls duration of the seasonsObliquity: axial tilt, controls amplitude of seasonal cyclePrecession: variation in Earth’s axis of rotation, affects difference betweenseasons
-
Model selection
What drives the glacial-interglacial cycle?
Which aspect of the astronomical forcing is of primary importance?
Which models best represent the cycle?
Most simple models of the [...] glacial cycles have at leastfour degrees of freedom [parameters], and some have as manyas twelve. Unsurprisingly [...this is] insufficient to distinguishbetween the skill of the various models (Roe and Allen 1999)
Bayesian model selection revolves around the use of the Bayes factor,which are notoriously difficult to compute.
Model selection for stochastic differential equations
1000 observations, 3000 unknown state variables, 1000 unknowntimes, 17 unknown parameters, choice of 5 different simulators.
Simulation studies show we can accurately choose between competingmodels, and identify the correct forcing.
-
Model selection
What drives the glacial-interglacial cycle?
Which aspect of the astronomical forcing is of primary importance?
Which models best represent the cycle?
Most simple models of the [...] glacial cycles have at leastfour degrees of freedom [parameters], and some have as manyas twelve. Unsurprisingly [...this is] insufficient to distinguishbetween the skill of the various models (Roe and Allen 1999)
Bayesian model selection revolves around the use of the Bayes factor,which are notoriously difficult to compute.
Model selection for stochastic differential equations
1000 observations, 3000 unknown state variables, 1000 unknowntimes, 17 unknown parameters, choice of 5 different simulators.
Simulation studies show we can accurately choose between competingmodels, and identify the correct forcing.
-
Age model
Can we also quantify chronologicaluncertainty?
dXt = g(Xt , θ)dt + F (t, γ)dt + ΣdW
Yt = d + sX1,t + �t
Plus an age model
dH = −µsdT + σdW
Targetπ(θ,T1:N ,X1:N ,Mk |y1:N)
where T1:N are the unknown times of the observations Y1:N , X1:N are theclimate state variables through time, Mk is the simulation model used,and θ is the corresponding parameter.
I.e., can we simultaneously date the stack, do climate reconstruction, fitthe model, and choose between models?
-
Age model
Can we also quantify chronologicaluncertainty?
dXt = g(Xt , θ)dt + F (t, γ)dt + ΣdW
Yt = d + sX1,t + �t
Plus an age model
dH = −µsdT + σdWTarget
π(θ,T1:N ,X1:N ,Mk |y1:N)
where T1:N are the unknown times of the observations Y1:N , X1:N are theclimate state variables through time, Mk is the simulation model used,and θ is the corresponding parameter.I.e., can we simultaneously date the stack, do climate reconstruction, fitthe model, and choose between models?
-
Simulation study results - age vs depth (trend removed)Dots = truth, black line = estimate, grey = 95% CI
●
●
●
●●
●
●●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
● ●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
30 25 20 15 10 5 0
−30
−20
−10
010
20
Depth (m)
Tim
e (d
rift r
emov
ed, k
yr)
-
Simulation study results - climate reconstructionDots = truth, black line = estimate, grey = 95% CI
●
●
●
●
●●
●
●
●
●
●
●
●
●
●● ●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
● ●●
●●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
30 25 20 15 10 5 0
−1.
5−
1.0
−0.
50.
00.
51.
01.
5
Depth (m)
X1
●
●
●●
●●
●●
●
●
● ●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
● ●
●
●
●
●
●
●
●
● ●
●●
●
●
●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●●
●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
● ●●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
30 25 20 15 10 5 0
−3
−2
−1
01
23
Depth (m)
X2
-
Simulation study results - parameter estimation
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
01
23
β0
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
0.0
0.5
1.0
1.5
2.0
β1
0.0 0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
β2
0.0 0.5 1.0 1.5 2.0
01
23
45
6
δ
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
γp
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
γc
0.0 0.2 0.4 0.6 0.8 1.0
02
46
810
γe
0.0 0.2 0.4 0.6 0.8 1.0
02
46
810
σ1
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
1.0
2.0
σ2
0.0 0.1 0.2 0.3 0.4 0.5
010
2030
40
σobs
3.0 3.5 4.0 4.5 5.0
02
46
810
D
0.5 1.0 1.5 2.0
01
23
45
67
S
0e+00 2e−05 4e−05 6e−05 8e−05 1e−04
050
000
1500
00
µs
0.000 0.002 0.004 0.006 0.008 0.010
050
010
0015
00
σs
0 10 20 30 40 50
0.00
0.04
0.08
α
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
φ0
0.000 0.001 0.002 0.003 0.004
020
060
010
00
c
Simultaneous inference of the choice between 5 models, 17 parameters,800 ages, 2400 climate variables, using just 800 observations.
-
Results for ODP846 - age vs depth (trend removed)Black = posterior mean, grey = 95%CI, red = Huybers 2007, blue = Lisieki and Raymo2004
30 25 20 15 10 5 0
−50
−40
−30
−20
−10
010
20
Depth (m)
Tim
e (d
rift r
emov
ed, k
yr)
Advantages: full UQ, model selection, simultaneous parameter estimationand climate reconstructionIgnoring uncertainty leads to incorrect conclusions
-
Model discrepancyConsider the state space model:
xt+1 = fθ(xt) + et , yt = g(xt) + �t
et ∼ p(·), �t ∼ q(·)
How do we correct errors in f or g?
Use a GP discrepancy model - eg, xt+1 = fθ(xt) + δ(xt) + et
Chapter 6: Gaussian Process Models of Simulator Discrepancy
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)=0.5xt+8ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)= 25xt/(1+xt^2)+8ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)= 8 ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)= 0.5+8 ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)=25 xt/(1+xt^2)+8 ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)=8 ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)= 0.5+8 ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)= 25 xt/(1+xt^2)+8 ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)= 8 ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
Figure 6.6: The learnt discrepancy (solid black line) and the true discrepancy function
(red line) from using different incorrect simulators with Gaussian process discrepancy.
Note that data are generated from Equations 6.4.1 and 6.4.2 with true parameters
(q2, r2) as (0.1, 1) (3 plots in the top row), (1, 0.1) (3 plots in the middle row), and
(1, 100) (3 plots in the bottom row), respectively.
ancy from using different incorrect simulators with Gaussian process discrepancy to
model different set of data. We have found that when data come from the system with
193
Technical challenge: inference using PGAS works but is expensive. Avariational approach looks more promising.
-
Model discrepancyConsider the state space model:
xt+1 = fθ(xt) + et , yt = g(xt) + �t
et ∼ p(·), �t ∼ q(·)
How do we correct errors in f or g?Use a GP discrepancy model - eg, xt+1 = fθ(xt) + δ(xt) + et
Chapter 6: Gaussian Process Models of Simulator Discrepancy
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)=0.5xt+8ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)= 25xt/(1+xt^2)+8ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)= 8 ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)= 0.5+8 ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)=25 xt/(1+xt^2)+8 ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)=8 ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)= 0.5+8 ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)= 25 xt/(1+xt^2)+8 ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
−30 −20 −10 0 10 20 30
−15
−10
−5
05
10
15
f(xt,ut)= 8 ut+GP(0,K)
x(t)
x(t+1)−f(xt,ut)
Figure 6.6: The learnt discrepancy (solid black line) and the true discrepancy function
(red line) from using different incorrect simulators with Gaussian process discrepancy.
Note that data are generated from Equations 6.4.1 and 6.4.2 with true parameters
(q2, r2) as (0.1, 1) (3 plots in the top row), (1, 0.1) (3 plots in the middle row), and
(1, 100) (3 plots in the bottom row), respectively.
ancy from using different incorrect simulators with Gaussian process discrepancy to
model different set of data. We have found that when data come from the system with
193
Technical challenge: inference using PGAS works but is expensive. Avariational approach looks more promising.
-
Conclusions
UQ can be vital: ignoring uncertainty can lead to incorrectconclusions, often in subtle ways.
Computational tractability is one of the key bottlenecks: bigsimulation and big data.
Methods from machine learning have the potential to help us makelarge advances in statistical methodology.
Thank you for listening!
-
Conclusions
UQ can be vital: ignoring uncertainty can lead to incorrectconclusions, often in subtle ways.
Computational tractability is one of the key bottlenecks: bigsimulation and big data.
Methods from machine learning have the potential to help us makelarge advances in statistical methodology.
Thank you for listening!