approximate bayesian computation: a simulation based ... · approximate rejection algorithm with...
Post on 22-Sep-2020
5 Views
Preview:
TRANSCRIPT
Approximate Bayesian Computation: a simulation
based approach to inference
Richard Wilkinson1 Simon Tavare2
1Department of Probability and StatisticsUniversity of Sheffield
2Department of Applied Mathematics and Theoretical PhysicsUniversity of Cambridge
Workshop on Approximate Inference in Stochastic Processes andDynamical System
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 1 / 19
Stochastic ComputationImplicit Statistical Models
Two types of statistical model:
Prescribed models - likelihood function is specified.
Implicit models - mechanism to simulate observations.
Implicit models give scientistsmore freedom to accuratelymodel the phenomenon underconsideration. The increase incomputer power has madethere use more practicable.Popular in many disciplines.
��������
��������
��������
��������
��������
��������
��������
����������������
��������
��������
����������������
��������
��������
��������
��������
����������������������������������������������������������������
����������������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
����
����
����
��������
����
����������������
����
����
����
������������
��������
����
��������
��������
����
����
����
��������
Time
t
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 2 / 19
Fitting to data
Most models are forwards models, i.e., specify parameters θ and i.c.s andthe model generates output D. Usually, we are interested in theinverse-problem, i.e., observe data, want to estimate parameter values.Different terminology:
Calibration
Data assimilation
Parameterestimation
Inverse-problem
Bayesianinference
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 3 / 19
Monte Carlo Inference
Aim to sample from the posterior distribution:
π(θ|D) ∝ prior × likelihood = π(θ)P(D|θ).
Monte Carlo methods enable Bayesian inference to be done in morecomplex models.
MCMC can be difficult or impossible in many stochastic models, e.g.,if
◮ P(D|θ) unknown - true for many stochastic models,◮ or where there are convergence or mixing problems, often caused by
highly dependent data arising from an underlying tree or graphicalstructure.
⋆ Population Genetics⋆ Epidemiology⋆ Evolutionary Biology
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 4 / 19
Likelihood-Free Inference
Rejection Algorithm
Draw θ from prior π(·)Accept θ with probability P(D | θ)
Accepted θ are independent draws from the posterior distribution,π(θ | D).
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 5 / 19
Likelihood-Free Inference
Rejection Algorithm
Draw θ from prior π(·)Accept θ with probability P(D | θ)
Accepted θ are independent draws from the posterior distribution,π(θ | D).If the likelihood, P(D|θ), is unknown:
‘Mechanical’ Rejection Algorithm
Draw θ from π(·)Simulate D′ ∼ P(· | θ)
Accept θ if D = D′
The acceptance rate is P(D): the number of runs to get n observations isnegative binomial, with mean n
P(D) .
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 5 / 19
Approximate Bayesian Computation I
If P(D) is small, we will rarely accept any θ. Instead, there is anapproximate version:
Approximate Rejection Algorithm
Draw θ from π(θ)
Simulate D′ ∼ P(· | θ)
Accept θ if ρ(D,D′) ≤ ǫ
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 6 / 19
Approximate Bayesian Computation I
If P(D) is small, we will rarely accept any θ. Instead, there is anapproximate version:
Approximate Rejection Algorithm
Draw θ from π(θ)
Simulate D′ ∼ P(· | θ)
Accept θ if ρ(D,D′) ≤ ǫ
This generates observations from π(θ | ρ(D,D′) < ǫ):
As ǫ → ∞, we get observations from the prior, π(θ).
If ǫ = 0, we generate observations from π(θ | D).
ǫ reflects the tension between computability and accuracy.
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 6 / 19
Approximate Bayesian Computation II
If the data are too high dimensional we never observe simulations that are‘close’ to the field data.Reduce the dimension using summary statistics, S(D).
Approximate Rejection Algorithm With Summaries
Draw θ from π(θ)
Simulate D′ ∼ P(· | θ)
Accept θ if ρ(S(D),S(D′)) < ǫ
If S is sufficient this is equivalent to the previous algorithm.
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 7 / 19
Error Structure
Example (Gaussian Distribution)
Suppose Xi ∼ N(µ, σ2), with σ2 known, and give µ an improper flat priordistribution, π(µ) = 1 for µ ∈ R.
Suppose we observe data withx = 0.
Pick µ ∼ U(−∞,∞)
Simulate Xi ∼ N(µ, σ2)
Accept µ if |x | < ǫ.
Then π(µ | |x | ≤ ǫ) =
Φ
(
ǫ−µ√σ2/n
)
− Φ
(
−ǫ−µ√σ2/n
)
2ǫ
and
Var(µ | |x | ≤ ǫ) = Var(µ | x = 0)+ǫ2
3
1000 samples
−10 −5 0 5 10
0.0
0.4
0.8
1.2
−10 −5 0 5 10
0.0
0.4
0.8
1.2
−10 −5 0 5 10
0.0
0.4
0.8
1.2
−10 −5 0 5 10
0.0
0.4
0.8
1.2
ǫ = 1 ǫ = 5
ǫ = 0.5ǫ = 0.1
Den
sity
Den
sity
Den
sity
Den
sity
µµ
µµ
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 8 / 19
Approximate MCMCRejection sampling is inefficient, as θ is repeatedly sampled from its priordistribution.
The idea behind MCMC is that by correlating observations moretime is spent in regions of high likelihood.
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 9 / 19
Approximate MCMCRejection sampling is inefficient, as θ is repeatedly sampled from its priordistribution.
The idea behind MCMC is that by correlating observations moretime is spent in regions of high likelihood.
Approximate Metropolis-Hastings Algorithm
Suppose we are currently at θ. Propose θ′ from density q(θ, θ′).
Simulate D′ from P(·|θ′).If ρ(D,D′) ≤ ǫ, calculate
h(θ, θ′) = min
(
1,π(θ′)q(θ′, θ)
π(θ)q(θ, θ′)
)
.
Accept the move to θ′ with probability h(θ, θ′), else stay at θ.
Adaptive tolerance choices.Sisson et al. and Robert et al. proposed an approximate sequentialimportance sampling algorithm.
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 9 / 19
ABC-within-MCMC
Problem: a low acceptance rate leads to slow convergence.
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 10 / 19
ABC-within-MCMC
Problem: a low acceptance rate leads to slow convergence.Suppose θ = (θ1,θ2) with
π(θ1 | D,θ2) known,
π(θ2 | D,θ1) unknown.
We can combine Gibbs update steps (or any M-H update) with ABC.
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 10 / 19
ABC-within-MCMC
Problem: a low acceptance rate leads to slow convergence.Suppose θ = (θ1,θ2) with
π(θ1 | D,θ2) known,
π(θ2 | D,θ1) unknown.
We can combine Gibbs update steps (or any M-H update) with ABC.
ABC-within-Gibbs Algorithm
Suppose we are at θt = (θt1, θ
t2)
1. Draw θt+11 ∼ π(θ1 | D, θt
2)
2. Draw θ∗2 ∼ πθ2(·)
◮ Simulate D′ ∼ P(· | θt+11 , θ∗2 )
◮ If ρ(D,D′) < ǫ, set θt+12 = θ∗2 . Else return to step 2.
This is often the case for models with a hidden tree structure generatinghighly dependent data.
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 10 / 19
Example From Population BiologyInferring ancestral divergence times
����
������������
����
��������
��������
��������
��������
����
��������
����
����������������
����
��������
��������
��������
��������������������������������
��������
��������
����
����
��������
����
����
��
����
����
��
����
��������
����
��������
����
��������
��
��
����
����
������������
����
��������
����
��������
����
����
����
����
����
����
����
��������
Time
t
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 11 / 19
Choosing summary statistics and metrics
We need
summaries S(D), which aresensitive to changes in θ, butrobust to random variations inDa definition of approximatesufficiency (LeCam 1963):distance between π(θ | D) andπ(θ | S(D))?
−1.0 −0.5 0.0 0.5 1.0 1.5
−1.
0−
0.5
0.0
0.5
1.0
1.5
2.0
D1
D2
a systematic implementable approach for finding good summarystatistics.
Complex dependence structures can be accounted for.
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 12 / 19
ABC Approach
Data can be thought of in two parts:
the observed number of fossils Di found in ith interval
the total number of fossils found, D+.
D′ denotes simulated data. A suitable metric might be
ρ(D,D′) =k
∑
i=1
∣
∣
∣
∣
Di
D+− D ′
i
D ′+
∣
∣
∣
∣
+
∣
∣
∣
∣
D ′+
D+− 1
∣
∣
∣
∣
Note: no data summaries here
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 13 / 19
Not going so well
0 200 400 600 800 1000
5010
015
020
025
030
0
Ext
ant
Pop
ula
tion
Siz
e
Iteration Number
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 14 / 19
Tweak the metric
The simulated N0 values are too small (376 modern species)
Easy to combine different types of information with ABC
Change the metric
ρ(D,D′) =k
∑
i=1
∣
∣
∣
∣
Di
D+− D ′
i
D ′+
∣
∣
∣
∣
+
∣
∣
∣
∣
D ′+
D+− 1
∣
∣
∣
∣
+
∣
∣
∣
∣
N ′
0
N0− 1
∣
∣
∣
∣
This gives approximate samples from
π(θ | D,N0 = 376) ∝ P(D,N0 = 376 | θ)π(θ)
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 15 / 19
Results
60 80 100 120 140
0.00
0.01
0.02
0.03
Den
sity
Divergence Time (My)
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 16 / 19
Extensions
Model selection:
Ratio of acceptance ratesπM1
(S ′≈S)
πM2(S ′≈S) ≈ Bayes Factor. Relative
acceptance rates gives posterior model probabilities.◮ Hopeless in practice as it is too sensitive to the tolerance ǫ.
Raftery and Lewis (1992) and Chib (1995) give computationalschemes to calculate Bayes factors. Neither works.
Expensive Simulators:
Emulate the stochastic model with a Gaussian process emulator.Richard Boys, Darren Wilkinson et al. .
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 17 / 19
Pros and cons of ABC
Pros ◮ Likelihood is not needed◮ Easy to code◮ Easy to adapt◮ Generates independent observations (parallel computation)
Cons ◮ Hard to anticipate effect of summary statistics (needs intuition)◮ Over dispersion of posterior due to ρ(D,D′) < ǫ◮ For complex problems, sampling from the prior does not make good
use of observations
Issues ◮ One run or many?◮ How to choose good summary statistics?◮ How good an approximation do we get?
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 18 / 19
References
M. A. Beaumont and W. Zhang and D. J. Balding,Approximate Bayesian Computatation in Population Genetics,Genetics, 2002.
P. Marjoram and J. Molitor and V. Plagnol and S.
Tavare, Markov Chain Monte Carlo without likelihoods, PNAS,2003.
S. A. Sisson and Y. Fan and M. M. Tanaka, SequentialMonte Carlo without Likelihoods, PNAS, 2007.
C. P. Robert, M. A. Beaumont, J. Marin and J. Cornuet,Adaptivity for ABC algorithms: the ABC-PMC scheme, arXiv, 2008.
R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 19 / 19
top related