approximate bayesian computation: a simulation based ... · approximate rejection algorithm with...

Approximate Bayesian Computation: a simulation

based approach to inference

Richard Wilkinson1 Simon Tavare2

1Department of Probability and StatisticsUniversity of Sheffield

2Department of Applied Mathematics and Theoretical PhysicsUniversity of Cambridge

Workshop on Approximate Inference in Stochastic Processes andDynamical System

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 1 / 19

Stochastic ComputationImplicit Statistical Models

Two types of statistical model:

Prescribed models - likelihood function is specified.

Implicit models - mechanism to simulate observations.

Implicit models give scientistsmore freedom to accuratelymodel the phenomenon underconsideration. The increase incomputer power has madethere use more practicable.Popular in many disciplines.

��

Fitting to data

Most models are forwards models, i.e., specify parameters θ and i.c.s andthe model generates output D. Usually, we are interested in theinverse-problem, i.e., observe data, want to estimate parameter values.Different terminology:

Calibration

Data assimilation

Parameterestimation

Inverse-problem

Bayesianinference

Monte Carlo Inference

Aim to sample from the posterior distribution:

π(θ|D) ∝ prior × likelihood = π(θ)P(D|θ).

Monte Carlo methods enable Bayesian inference to be done in morecomplex models.

MCMC can be difficult or impossible in many stochastic models, e.g.,if

◮ P(D|θ) unknown - true for many stochastic models,◮ or where there are convergence or mixing problems, often caused by

highly dependent data arising from an underlying tree or graphicalstructure.

⋆ Population Genetics⋆ Epidemiology⋆ Evolutionary Biology

Likelihood-Free Inference

Rejection Algorithm

Draw θ from prior π(·)Accept θ with probability P(D | θ)

Accepted θ are independent draws from the posterior distribution,π(θ | D).

Likelihood-Free Inference

Rejection Algorithm

Draw θ from prior π(·)Accept θ with probability P(D | θ)

Accepted θ are independent draws from the posterior distribution,π(θ | D).If the likelihood, P(D|θ), is unknown:

‘Mechanical’ Rejection Algorithm

Draw θ from π(·)Simulate D′ ∼ P(· | θ)

Accept θ if D = D′

The acceptance rate is P(D): the number of runs to get n observations isnegative binomial, with mean n

P(D) .

Approximate Bayesian Computation I

If P(D) is small, we will rarely accept any θ. Instead, there is anapproximate version:

Approximate Rejection Algorithm

Draw θ from π(θ)

Simulate D′ ∼ P(· | θ)

Accept θ if ρ(D,D′) ≤ ǫ

Approximate Bayesian Computation I

If P(D) is small, we will rarely accept any θ. Instead, there is anapproximate version:

Approximate Rejection Algorithm

Draw θ from π(θ)

Accept θ if ρ(D,D′) ≤ ǫ

This generates observations from π(θ | ρ(D,D′) < ǫ):

As ǫ → ∞, we get observations from the prior, π(θ).

If ǫ = 0, we generate observations from π(θ | D).

ǫ reflects the tension between computability and accuracy.

Approximate Bayesian Computation II

If the data are too high dimensional we never observe simulations that are‘close’ to the field data.Reduce the dimension using summary statistics, S(D).

Approximate Rejection Algorithm With Summaries

Draw θ from π(θ)

Accept θ if ρ(S(D),S(D′)) < ǫ

If S is sufficient this is equivalent to the previous algorithm.

Error Structure

Example (Gaussian Distribution)

Suppose Xi ∼ N(µ, σ2), with σ2 known, and give µ an improper flat priordistribution, π(µ) = 1 for µ ∈ R.

Suppose we observe data withx = 0.

Pick µ ∼ U(−∞,∞)

Simulate Xi ∼ N(µ, σ2)

Accept µ if |x | < ǫ.

Then π(µ | |x | ≤ ǫ) =

ǫ−µ√σ2/n

− Φ

−ǫ−µ√σ2/n

Var(µ | |x | ≤ ǫ) = Var(µ | x = 0)+ǫ2

1000 samples

−10 −5 0 5 10

ǫ = 1 ǫ = 5

ǫ = 0.5ǫ = 0.1

Approximate MCMCRejection sampling is inefficient, as θ is repeatedly sampled from its priordistribution.

The idea behind MCMC is that by correlating observations moretime is spent in regions of high likelihood.

Approximate MCMCRejection sampling is inefficient, as θ is repeatedly sampled from its priordistribution.

The idea behind MCMC is that by correlating observations moretime is spent in regions of high likelihood.

Approximate Metropolis-Hastings Algorithm

Suppose we are currently at θ. Propose θ′ from density q(θ, θ′).

Simulate D′ from P(·|θ′).If ρ(D,D′) ≤ ǫ, calculate

h(θ, θ′) = min

1,π(θ′)q(θ′, θ)

π(θ)q(θ, θ′)

Accept the move to θ′ with probability h(θ, θ′), else stay at θ.

Adaptive tolerance choices.Sisson et al. and Robert et al. proposed an approximate sequentialimportance sampling algorithm.

ABC-within-MCMC

Problem: a low acceptance rate leads to slow convergence.

ABC-within-MCMC

Problem: a low acceptance rate leads to slow convergence.Suppose θ = (θ1,θ2) with

π(θ1 | D,θ2) known,

π(θ2 | D,θ1) unknown.

We can combine Gibbs update steps (or any M-H update) with ABC.

ABC-within-MCMC

Problem: a low acceptance rate leads to slow convergence.Suppose θ = (θ1,θ2) with

π(θ1 | D,θ2) known,

π(θ2 | D,θ1) unknown.

We can combine Gibbs update steps (or any M-H update) with ABC.

ABC-within-Gibbs Algorithm

Suppose we are at θt = (θt1, θ

1. Draw θt+11 ∼ π(θ1 | D, θt

2. Draw θ∗2 ∼ πθ2(·)

◮ Simulate D′ ∼ P(· | θt+11 , θ∗2 )

◮ If ρ(D,D′) < ǫ, set θt+12 = θ∗2 . Else return to step 2.

This is often the case for models with a hidden tree structure generatinghighly dependent data.

Example From Population BiologyInferring ancestral divergence times

��

Choosing summary statistics and metrics

We need

summaries S(D), which aresensitive to changes in θ, butrobust to random variations inDa definition of approximatesufficiency (LeCam 1963):distance between π(θ | D) andπ(θ | S(D))?

−1.0 −0.5 0.0 0.5 1.0 1.5

a systematic implementable approach for finding good summarystatistics.

Complex dependence structures can be accounted for.

ABC Approach

Data can be thought of in two parts:

the observed number of fossils Di found in ith interval

the total number of fossils found, D+.

D′ denotes simulated data. A suitable metric might be

ρ(D,D′) =k

D+− D ′

D ′+

D+− 1

Note: no data summaries here

Not going so well

0 200 400 600 800 1000

Iteration Number

Tweak the metric

The simulated N0 values are too small (376 modern species)

Easy to combine different types of information with ABC

Change the metric

ρ(D,D′) =k

D+− D ′

D ′+

D+− 1

N0− 1

This gives approximate samples from

π(θ | D,N0 = 376) ∝ P(D,N0 = 376 | θ)π(θ)

Results

60 80 100 120 140

Divergence Time (My)

Extensions

Model selection:

Ratio of acceptance ratesπM1

(S ′≈S)

πM2(S ′≈S) ≈ Bayes Factor. Relative

acceptance rates gives posterior model probabilities.◮ Hopeless in practice as it is too sensitive to the tolerance ǫ.

Raftery and Lewis (1992) and Chib (1995) give computationalschemes to calculate Bayes factors. Neither works.

Expensive Simulators:

Emulate the stochastic model with a Gaussian process emulator.Richard Boys, Darren Wilkinson et al. .

Pros and cons of ABC

Pros ◮ Likelihood is not needed◮ Easy to code◮ Easy to adapt◮ Generates independent observations (parallel computation)

Cons ◮ Hard to anticipate effect of summary statistics (needs intuition)◮ Over dispersion of posterior due to ρ(D,D′) < ǫ◮ For complex problems, sampling from the prior does not make good

use of observations

Issues ◮ One run or many?◮ How to choose good summary statistics?◮ How good an approximation do we get?

References

M. A. Beaumont and W. Zhang and D. J. Balding,Approximate Bayesian Computatation in Population Genetics,Genetics, 2002.

P. Marjoram and J. Molitor and V. Plagnol and S.

Tavare, Markov Chain Monte Carlo without likelihoods, PNAS,2003.

S. A. Sisson and Y. Fan and M. M. Tanaka, SequentialMonte Carlo without Likelihoods, PNAS, 2007.

C. P. Robert, M. A. Beaumont, J. Marin and J. Cornuet,Adaptivity for ABC algorithms: the ABC-PMC scheme, arXiv, 2008.

approximate bayesian computation: a simulation based ... · approximate rejection algorithm with...

Documents

systemanalyse modellierung bereich: modellgetriebenen...

18.650 (f16) lecture 5: parametric hypothesis testing · h....

about lock-in...

3-d cinematography with approximate and no geometry -...

s3.amazonaws.com€¦ · web viewuse the pythagorean...

copyrighted material 1 l spaces and banach...

chapter 7 balancing of reciprocating machines θ θ θ ω θ...

θ hypotenuse adjacent opposite there are 6 trig ratios that...

lecture 5: small angle scattering · small angle neutron...

reduction formulas - madasmaths · created by t. madas...

Γ Θ þ Θ ü...

given a right triangle with acute angle θ: θ

chapter 4 bayesian - university of...

chapter 9 trigonometry - macmillan caribbean ·...

an operational ssl hf system (milcom 2007) · ahp(θ) =...

biomechanics of human movement - dphu · numerical...

…...feb 11, 2019 · 4 = 2 cos cos sin cot cos sin sin...

without the use of a calculator 1. a. θ 1 2...solve...

11 advanced (non-traditional) machining processesadvanced...

physics 102: lecture 21 diffraction, gratings, resolving...