tutorial on approximate bayesian computation · inference for simulator-based models exact...
TRANSCRIPT
Tutorial on Approximate Bayesian Computation
Michael Gutmann
https://sites.google.com/site/michaelgutmann
University of Helsinki Aalto University
Helsinki Institute for Information Technology
16 May 2016
Content
Two parts:
1. The basics of approximate Bayesian computation (ABC)
2. Computational and statistical efficiency
What is ABC?
A set of methods for approximate Bayesian inference which
can be used whenever sampling from the model is possible.
Michael Gutmann ABC Tutorial 2 / 65
Part I
Basic ABC
Program
PreliminariesStatistical inferenceSimulator-based modelsLikelihood function
Inference for simulator-based modelsExact inferenceApproximate inferenceRejection ABC algorithm
Michael Gutmann ABC Tutorial 4 / 65
Program
PreliminariesStatistical inferenceSimulator-based modelsLikelihood function
Inference for simulator-based modelsExact inferenceApproximate inferenceRejection ABC algorithm
Michael Gutmann ABC Tutorial 5 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
Big picture of statistical inference
I Given data yo , draw conclusions about properties of its sourceI If available, possibly take prior information into account
yo
Data space
Observation
Inference
Data source
Unknown properties
Prior information
Michael Gutmann ABC Tutorial 6 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
General approach
I Set up a model with potential properties θ (parameters)I See which θ are reasonable given the observed data
Prior information
yo
Data space
Observation
Inference
Data source
Unknown properties
Model
M(θ)
Michael Gutmann ABC Tutorial 7 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
Likelihood function
I Measures agreement between θ and the observed data yo
I Probability to see data y like yo if property θ holds
yo
Data space
ObservationData source
Unknown properties
Model
M(θ)Data generation
y|θ
Michael Gutmann ABC Tutorial 8 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
Likelihood function
I Measures agreement between θ and the observed data yo
I Probability to see data y like yo if property θ holds
yo
Data space
ObservationData source
Unknown properties
Model
M(θ)Data generation
εy|θ
Michael Gutmann ABC Tutorial 9 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
Likelihood function
I For discrete random variables:
L(θ) = Pr(y = yo |θ) (1)
I For continuous random variables:
L(θ) = limε→0
Pr(y ∈ Bε(yo)|θ)
Vol(Bε(yo))(2)
Michael Gutmann ABC Tutorial 10 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
Performing statistical inference
I If L(θ) is known, inference boils down to solving anoptimization/sampling problem
I Maximum likelihood estimation
θ = argmaxθ L(θ)
I Bayesian inference
p(θ|yo) ∝ p(θ)× L(θ)
posterior ∝ prior × likelihood
Michael Gutmann ABC Tutorial 11 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
Textbook case
I model ≡ family of probability density/mass functions p(y|θ)
I Likelihood function L(θ) = p(yo |θ)
I Closed form solutions are possible.
-15 -10 -5 0 5 10 150
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
mean
prio
r pdf
-15 -10 -5 0 5 10 150
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
mean
likel
ihoo
d
-15 -10 -5 0 5 10 150
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
mean
post
erio
r pdf
prior likelihood posterior
Michael Gutmann ABC Tutorial 12 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
Simulator-based models
I Not all models are specified as family of pdfs p(y|θ).
I Here: simulator-based models:
models which are specified via a mechanism (rule) forgenerating data
Michael Gutmann ABC Tutorial 13 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
Toy example
I Let y |θ ∼ N (θ, 1)
I Family of pdfs as model:
p(y |θ) =1√2π
exp
(−(y − θ)2
2
)(3)
I Simulator-based model:
y = z + θ z ∼ N (0, 1) (4)
or
y = z + θ z =√−2 log(ω) cos(2πν) (5)
where ω and ν are independent random variables uniformlydistributed on (0, 1)
Michael Gutmann ABC Tutorial 14 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
Formal definition of a simulator-based model
I Let (Ω,F ,P) be a probability space.I A simulator-based model is a collection of (measurable)
functions g(.,θ) parametrized by θ,
ω ∈ Ω 7→ y = g(ω,θ) ∈ Y (6)
I The functions g(.,θ) are typically not available in closed form.
Simulation / Sampling
Michael Gutmann ABC Tutorial 15 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
Other names for simulator-based models
I Models specified via a data generating mechanism occur inmultiple and diverse scientific fields.
I Different communities use different names for simulator-basedmodels:
I Generative modelsI Implicit modelsI Stochastic simulation modelsI Probabilistic programs
Michael Gutmann ABC Tutorial 16 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
Examples
I Astrophysics:Simulating the formation ofgalaxies, stars, or planets
I Evolutionary biology:Simulating evolution
I Neuroscience:Simulating neural circuits
I Ecology:Simulating species migration
I Health science:Simulating the spread of aninfectious disease
I . . .
Simulated neural activity in rat somatosensory cortex
(Figure from https://bbp.epfl.ch/nmc-portal)
Michael Gutmann ABC Tutorial 17 / 65
Example (health science)
I Simulating bacterial transmissions in child day care centers(Numminen et al, 2013)
Individual
Str
ain
5 10 15 20 25 30 35
5
10
15
20
25
30
Individual
Str
ain
5 10 15 20 25 30 35
5
10
15
20
25
30
Individual
Str
ain
5 10 15 20 25 30 35
5
10
15
20
25
30
Time
Individual
Str
ain
5 10 15 20 25 30 35
5
10
15
20
25
30
Individual
Strain
Michael Gutmann ABC Tutorial 18 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
Advantages of simulator-based models
I Direct implementation of hypotheses of how the observeddata were generated.
I Neat interface with physical or biological models of data.
I Modeling by replicating the mechanisms of nature whichproduced the observed/measured data. (“Analysis bysynthesis”)
I Possibility to perform experiments in silico.
Michael Gutmann ABC Tutorial 19 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
Disadvantages of simulator-based models
I Generally elude analytical treatment.
I Can be easily made more complicated than necessary.
I Statistical inference is difficult . . . but possible!
Michael Gutmann ABC Tutorial 20 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
Family of pdfs induced by the simulator
I For any fixed θ, the output of the simulator yθ = g(.,θ) is arandom variable.
I No closed-form formulae available for p(y|θ).
I Simulator defines the model pdfs p(y|θ) implicitly.
Michael Gutmann ABC Tutorial 21 / 65
Implicit definition of the model pdfs
A
A
Michael Gutmann ABC Tutorial 22 / 65
Implicit definition of the likelihood function
For discrete random variables:
Michael Gutmann ABC Tutorial 23 / 65
Implicit definition of the likelihood function
For continuous random variables: L(θ) = limε→0 Lε(θ)
Michael Gutmann ABC Tutorial 24 / 65
PreliminariesInference for simulator-based models
Statistical inferenceSimulator-based modelsLikelihood function
Implicit definition of the likelihood function
I To compute the likelihood function, we need to compute theprobability that the simulator generates data close to yo ,
Pr (y = yo |θ) or Pr (y ∈ Bε(yo)|θ)
I No analytical expression available.
I But we can empirically test whether simulated data equals yo
or is in Bε(yo).
I This property will be exploited to perform inference forsimulator-based models.
Michael Gutmann ABC Tutorial 25 / 65
Program
PreliminariesStatistical inferenceSimulator-based modelsLikelihood function
Inference for simulator-based modelsExact inferenceApproximate inferenceRejection ABC algorithm
Michael Gutmann ABC Tutorial 26 / 65
PreliminariesInference for simulator-based models
Exact inferenceApproximate inferenceRejection ABC algorithm
Exact inference for discrete random variables
I For discrete random variables, we can perform exact Bayesianinference without knowing the likelihood function.
I By definition, the posterior is obtained by conditioning p(θ, y)on the event y = yo :
p(θ|yo) =p(θ, yo)
p(yo)=
p(θ, y = yo)
p(y = yo)(7)
Michael Gutmann ABC Tutorial 27 / 65
PreliminariesInference for simulator-based models
Exact inferenceApproximate inferenceRejection ABC algorithm
Exact inference for discrete random variables
I Generate tuples (θi , yi ):
1. θi ∼ pθ (iid from the prior)2. ωi ∼ P (by running the simulator)3. yi = g(ωi ,θi ) (by running the simulator)
I Condition on y = yo ⇔ Retain only the tuples with yi = yo
I The θi from the retained tuples are samples from theposterior p(θ|yo).
Michael Gutmann ABC Tutorial 28 / 65
PreliminariesInference for simulator-based models
Exact inferenceApproximate inferenceRejection ABC algorithm
Example
I Posterior inference of the success probability θ in a Bernoullitrial.
I Data: yo = 1
I Prior: pθ = 1 on (0, 1)I Generate tuples (θi , yi )
1. θi ∼ pθ
2. ωi ∼ U(0, 1)
3. yi =
1 if ωi < θi
0 otherwise
I Retain those θi for which yi = yo .
Michael Gutmann ABC Tutorial 29 / 65
PreliminariesInference for simulator-based models
Exact inferenceApproximate inferenceRejection ABC algorithm
Example
I The method produces samples from the posterior.
I Monte Carlo error when summarizing the samples as anempirical distribution or computing expectations via sampleaverages.
I Histogram for N simulated tuples (θi , yi )
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
Success probability
Estimated posterior pdf
True posterior pdf
Prior pdf
N = 1000
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Success probability
Estimated posterior pdf
True posterior pdf
Prior pdf
N = 10, 000
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Success probability
Estimated posterior pdf
True posterior pdf
Prior pdf
N = 100, 000
Michael Gutmann ABC Tutorial 30 / 65
PreliminariesInference for simulator-based models
Exact inferenceApproximate inferenceRejection ABC algorithm
Limitations
I Only applicable to discrete random variables.
I And even for discrete random variables:Computationally not feasible in higher dimensions
I Reason: The probability of the event yθ = yo becomes smallerand smaller as the dimension of the data increases.
I Out of N simulated tuples only a small fraction will beaccepted.
I The small number of accepted samples do not represent theposterior well.
I Large Monte Carlo errors
Michael Gutmann ABC Tutorial 31 / 65
PreliminariesInference for simulator-based models
Exact inferenceApproximate inferenceRejection ABC algorithm
Approximations to make inference feasible
I Settle for approximate yet computationally feasible inference.I Introduce two types of approximations:
1. Instead of working with the whole data, work with lowerdimensional summary statistics tθ and to ,
tθ = T (yθ) to = T (yo). (8)
2. Instead of checking tθ = to , check whether ∆θ = d(to , tθ) isless than ε. (d may or may not be a metric)
Michael Gutmann ABC Tutorial 32 / 65
PreliminariesInference for simulator-based models
Exact inferenceApproximate inferenceRejection ABC algorithm
Approximation of the likelihood function
L(θ) = limε→0 Lε(θ) Lε(θ) = Pr(y∈Bε(yo)|θ)Vol(Bε(yo))
I Approximations are equivalent to:
1. Replacing Pr (y ∈ Bε′(yo) | θ) with Pr (∆θ ≤ ε| θ)2. Not taking the limit ε→ 0
I Defines an approximate likelihood function Lε(θ),
Lε(θ) ∝ Pr (∆θ ≤ ε | θ) (9)
I Discrepancy ∆θ is a (non-negative) random variable
∆θ = d(to , tθ) = d (T (yo),T (yθ))
Michael Gutmann ABC Tutorial 33 / 65
PreliminariesInference for simulator-based models
Exact inferenceApproximate inferenceRejection ABC algorithm
Rejection ABC algorithm
I The two approximations made yield the rejection algorithm forapproximate Bayesian computation (ABC):
1. Sample θi ∼ pθ
2. Simulate a data set yi by running the simulator with θi(yi = g(ωi ,θi ))
3. Compute the discrepancy ∆i = d(T (yo),T (yi ))4. Retain θi if ∆i ≤ ε
I This is the basic ABC algorithm.
Michael Gutmann ABC Tutorial 34 / 65
PreliminariesInference for simulator-based models
Exact inferenceApproximate inferenceRejection ABC algorithm
Properties
I Rejection ABC algorithm produces samples θ ∼ pε(θ|yo),
pε(θ|yo) ∝ pθ(θ)Lε(θ) (10)
Lε(θ) ∝ Pr(d(T (yo),T (y))︸ ︷︷ ︸
∆θ
≤ ε | θ)
(11)
I Inference is approximate due toI the summary statistics T and distance dI ε > 0I the finite number of samples (Monte Carlo error)
Michael Gutmann ABC Tutorial 35 / 65
Part II
Computational and statistical efficiency
Brief recap
I Simulator-based models: Models which are specified by a datagenerating mechanism.
I By construction, we can sample from simulator-based models.Likelihood function can generally not be written down.
I Rejection ABC: Trial and error scheme to find parametervalues which produce simulated data resembling the observeddata.
I Simulated data resemble the observed data if somediscrepancy measure is small.
Michael Gutmann ABC Tutorial 37 / 65
Efficiency of ABC
1. Computational efficiency: How to efficiently find theparameter values which yield a small discrepancy?
2. Statistical efficiency: How to measure the discrepancybetween the simulated and observed data?
Michael Gutmann ABC Tutorial 38 / 65
Program
Computational efficiencyDifficultiesSolutionsRecent work
Statistical efficiencyDifficultiesSolutionsRecent work
Michael Gutmann ABC Tutorial 39 / 65
Program
Computational efficiencyDifficultiesSolutionsRecent work
Statistical efficiencyDifficultiesSolutionsRecent work
Michael Gutmann ABC Tutorial 40 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Example
I Inference of the mean θ of aGaussian of variance one.
I Pr(y = yo |θ) = 0.
I Discrepancy ∆θ:
∆θ = (µo − µθ)2,
µo =1
n
n∑i=1
yoi ,
µθ =1
n
n∑i=1
yi ,
yi ∼ N (θ, 1)
-2 -1 0 1 2 3 4
0
2
4
6
8
10
12
14
16
θ
dis
cre
pancy
0.1, 0.9 quantiles
mean
realization of stochastic process
realizations at θ
Discrepancy ∆θ is a random variable.
Michael Gutmann ABC Tutorial 41 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Example
Probability that ∆θ is below some threshold ε approximates the
likelihood function.
0 0.5 1 1.5 2 2.5 3 3.5 40
0.2
0.4
0.6
0.8
1
θ
0.1, 0.9 quantiles
mean
threshold (0.1)approximate likelihood(rescaled) true likelihood(rescaled)
Michael Gutmann ABC Tutorial 42 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Example
I Here, T (y) = 1n
∑ni=1 yi is a sufficient statistics for inference
of the mean θ
I The only approximation is ε > 0.
I In general, the summary statistics will not be sufficient.
Michael Gutmann ABC Tutorial 43 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Example
I In the Gaussian example, the probability for ∆θ ≤ ε can becomputed in closed form ∆θ = (µo − µθ)2
Pr(∆θ ≤ ε) = Φ(√
n(µo − θ) +√
nε)−Φ
(√n(µo − θ)−
√nε)
Φ(x) =∫ x−∞
1√2π
exp(− 1
2u2)du
I For nε small: Lε(θ) ∝ Pr(∆θ ≤ ε) ∝√εL(θ)
I For small ε good approximation ofthe likelihood function.
I But for small ε, Pr(∆θ ≤ ε) ≈ 0:Very few samples will be accepted
0 0.5 1 1.5 2 2.5 3 3.5 40
0.2
0.4
0.6
0.8
1
θ
0.1, 0.9 quantiles
mean
threshold (0.1)approximate likelihood(rescaled) true likelihood(rescaled)
Michael Gutmann ABC Tutorial 44 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Two widely used algorithms
I Two widely used algorithms which improve computationallyupon rejection ABC:
1. Regression ABC (Beaumont et al, 2002)
2. Sequential Monte Carlo ABC (Sisson et al, 2007)
I Both use rejection ABC as a building block.
I Sequential Monte Carlo (SMC) ABC is also known asPopulation Monte Carlo (PMC) ABC.
Michael Gutmann ABC Tutorial 45 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Two widely used algorithms
I Regression ABC consists in running rejection ABC with arelatively large ε and then adjusting the obtained samples sothat they are closer to samples from the true posterior.
I Sequential Monte Carlo ABC consists in sampling θ from anadaptively constructed proposal distribution φ(θ) rather thanfrom the prior in order to avoid simulating many data setswhich are not accepted.
Michael Gutmann ABC Tutorial 46 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Basic idea of regression ABC
I The summary statistics tθ = T (yθ) and θ have a jointdistribution.
I Let ti be the summary statistics for simulated datayi = g(ωi ,θi ).
I We can learn a regression model between the summarystatistics (covariates) and the parameters (response variables)
θi = f (ti ) + ξi (12)
where ξi is the error term (zero mean random variable).
I The training data for the regression are typically tuples (θi , ti )produced by rejection-ABC with some sufficiently large ε.
Michael Gutmann ABC Tutorial 47 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Basic idea of regression ABC
I Fitting the regression model to the training data (θi , ti ) yieldsan estimated regression function f and the residuals ξi ,
ξi = θi − f (ti ) (13)
I Regression ABC consists in replacing θi with θ∗i ,
θ∗i = f (to) + ξi = f (to) + θi − f (ti ) (14)
I Corresponds to an adjustment of θi .
I If the relation between t and θ is learned correctly, the θ∗icorrespond to samples from an approximation with ε = 0.
Michael Gutmann ABC Tutorial 48 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Basic idea of sequential Monte Carlo ABC
I We may modify the rejection ABC algorithm and use φ(θ)instead of the prior pθ.
1. Sample θi ∼ φ(θ)2. Simulate a data set yi by running the simulator with θi
(yi = g(ωi ,θi ))3. Compute the discrepancy ∆i = d(T (yo),T (yi ))4. Retain θi if ∆i ≤ ε
I The retained samples follow a distribution proportional toφ(θ)Lε(θ)
Michael Gutmann ABC Tutorial 49 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Basic idea of sequential Monte Carlo ABC
I Parameters θi weighted with wi ,
wi =pθ(θi )
φ(θi ), (15)
follow a distribution proportional to pθ(θ)Lε(θ).I Can be used to iteratively morph the prior into a posterior:
I Use a sequence of shrinking thresholds εtI Run rejection ABC with ε0.I Define φt at iteration t based on the weighted samples from
the previous iteration (e.g Gaussian mixture with means equalto the θi from the previous iteration).
Michael Gutmann ABC Tutorial 50 / 65
Basic idea of sequential Monte Carlo ABC
0 0.4 0.8 1.20
0.5
1
1.5
2
2.5
3
3.5
true posterior
prior t=1
proposal t=2
proposal t=3
1
θSimulator
M(θ, )
2
3
t = t +1
y0ε3
ε2
ε1
Construction of proposal distribution
Michael Gutmann ABC Tutorial 51 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Learning a model of the discrepancy
Lε(θ) ∝ Pr (∆θ ≤ ε | θ)
I The approximate likelihood function Lε(θ) is determined bythe distribution of the discrepancy ∆θ
I If we knew the distribution of ∆θ we could compute Lε(θ).
I We proposed to learn a model of ∆θ and to approximateLε(θ) by Lε(θ),
Lε(θ) ∝ Pr (∆θ ≤ ε | θ) (16)
I Model is learned more accurately in regions where ∆θ tendsto be small to make further computational savings.
(Gutmann and Corander, Journal of Machine Learning Research, in press)
Michael Gutmann ABC Tutorial 52 / 65
Example: Bacterial infections in child care centers
I Likelihood intractable for cross-sectional dataI But generating data from the model is possible
Individual
Stra
in
5 10 15 20 25 30 35
5
10
15
20
25
30
Individual
Stra
in
5 10 15 20 25 30 35
5
10
15
20
25
30
Individual
Stra
in
5 10 15 20 25 30 35
5
10
15
20
25
30
Time
Individual
Stra
in
5 10 15 20 25 30 35
5
10
15
20
25
30
Individual
Str
ain
Parameters of interest:- rate of infections within a center- rate of infections from outside- competition between the strains
(Numminen et al, 2013)
Michael Gutmann ABC Tutorial 53 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Example: Bacterial infections in child care centers
I Comparison of the proposed approach with a standardpopulation Monte Carlo ABC approach.
I Roughly equal results using 1000 times fewer simulations.
4.5 days with 200 cores↓
90 minutes with seven cores
Posterior means: solid lines,
credibility intervals: shaded areas or dashed lines.
2 2.5 3 3.5 4 4.5 5 5.5 6
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Computational cost (log10)
Co
mp
etitio
n p
ara
me
ter
Developed Fast Method
Standard Method
(Gutmann and Corander, 2015)
Michael Gutmann ABC Tutorial 54 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Example: Bacterial infections in child care centers
I Comparison of the proposed approach with a standardpopulation Monte Carlo ABC approach.
I Roughly equal results using 1000 times fewer simulations.
2 2.5 3 3.5 4 4.5 5 5.5 6
1
2
3
4
5
6
7
8
9
10
11
Computational cost (log10)
Inte
rnal in
fection p
ara
mete
r
Developed Fast Method
Standard Method
2 2.5 3 3.5 4 4.5 5 5.5 6
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Computational cost (log10)
Exte
rnal in
fection p
ara
mete
r
Developed Fast Method
Standard Method
Posterior means are shown as solid lines, credibility intervals as shaded areas or dashed lines.
Michael Gutmann ABC Tutorial 55 / 65
Program
Computational efficiencyDifficultiesSolutionsRecent work
Statistical efficiencyDifficultiesSolutionsRecent work
Michael Gutmann ABC Tutorial 56 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
I Discrepancy measure affects the accuracy of the estimates
I Bad discrepancy: estimated posterior = prior
I Bad discrepancy: vanishingly small acceptance probability
I Good discrepancy: good trade-off between loss of informationand increase in acceptance probability
Michael Gutmann ABC Tutorial 57 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
How to choose the discrepancy measure?
I ManuallyI Use expert knowledge about yo to define summary statistics T .I Use Euclidean distance for d .
∆θ = ||T (yo)− T (yθ)||I Semi-automatic
I Simulate pairs (θi , yi )I Define a large number of summary statistics TI Define T as a smaller number of (linear) combinations of
them, automatically learned from the simulated pairs.I Use Euclidean distance for d .
I Combinations are typically determined via regression with theT (yθ) as covariates and parameters θ as reponse variables.e.g. Nunes and Balding, 2010; Fearnhead and Prangle, 2012; Aeschbacher et al,
2012; Blum et al, 2013
Michael Gutmann ABC Tutorial 58 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Discrepancy measurement via classification
(Gutmann et al, 2014)
I Classification accuracy (discriminability) as discrepancymeasure ∆θ.
I Discriminability of 100% indicates maximally different datasets; 50% indicates similar data sets.
−2 0 2 4 6 8−3
−2
−1
0
1
2
3
x−coordinate
y−
coord
inate
Observed data
Simulated data, mean (6,0)
−2 0 2 4 6 8−3
−2
−1
0
1
2
3
x−coordinate
y−
coord
inate
Observed data
Simulated data, mean (1/2,0)
Michael Gutmann ABC Tutorial 59 / 65
Example: Bacterial infections in child care centers
I Likelihood intractable for cross-sectional dataI But generating data from the model is possible
Individual
Stra
in
5 10 15 20 25 30 35
5
10
15
20
25
30
Individual
Stra
in
5 10 15 20 25 30 35
5
10
15
20
25
30
Individual
Stra
in
5 10 15 20 25 30 35
5
10
15
20
25
30
Time
Individual
Stra
in
5 10 15 20 25 30 35
5
10
15
20
25
30
Individual
Str
ain
Parameters of interest:- rate of infections within a center- rate of infections from outside- competition between the strains
(Numminen et al, 2013)
Michael Gutmann ABC Tutorial 60 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Example: Bacterial infections in child care centers
(Gutmann et al, 2014)
I Our classification-based distance measure does not usedomain/expert knowledge.
I Performs as well as a distance measure based on domainknowledge (Numminen et, 2013).
2 2.5 3 3.5 4 4.50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(a) Posterior pdf for β
0.4 0.6 0.8 1 1.20
0.5
1
1.5
2
2.5
3
(b) Posterior pdf for Λ
0.05 0.1 0.15 0.20
2
4
6
8
10
12
14
16
Expert
Classifier
(c) Posterior pdf for θ
Michael Gutmann ABC Tutorial 61 / 65
Computational efficiencyStatistical efficiency
DifficultiesSolutionsRecent work
Example: Bacterial infections in child care centers
(Gutmann et al, 2014)
I Robustness is a concern when relying on expert knowledgeI Classification-based distance can automatically compensate
errors in the expert input.
0.4 0.6 0.8 1 1.20
0.5
1
1.5
2
External infection parameter
Post
erio
r pro
babi
lity
dens
ity
Developed Robust MethodStandard MethodReference
Deterioration Compensation
Michael Gutmann ABC Tutorial 62 / 65
Summary
I The topic was Bayesian inference for models specified via asimulator (implicit / generative models).
I Introduced approximate Bayesian computation (ABC).
I Principle of ABC: Find parameter values which yield simulateddata resembling the observed data.
I Covered three classical algorithms:
1. Rejection ABC2. Regression ABC3. Sequential Monte Carlo ABC
I Choice of discrepancy measure between simulated andobserved data
I Recent work of mineI Combining modeling of the discrepancy and optimization to
increase computational efficiencyI Using classification to measure the discrepancy
Michael Gutmann ABC Tutorial 63 / 65
Summary
I The topic was Bayesian inference for models specified via asimulator (implicit / generative models).
I Introduced approximate Bayesian computation (ABC).
I Principle of ABC: Find parameter values which yield simulateddata resembling the observed data.
I Covered three classical algorithms:
1. Rejection ABC2. Regression ABC3. Sequential Monte Carlo ABC
I Choice of discrepancy measure between simulated andobserved data
I Recent work of mineI Combining modeling of the discrepancy and optimization to
increase computational efficiencyI Using classification to measure the discrepancy
Michael Gutmann ABC Tutorial 63 / 65
Summary
I The topic was Bayesian inference for models specified via asimulator (implicit / generative models).
I Introduced approximate Bayesian computation (ABC).
I Principle of ABC: Find parameter values which yield simulateddata resembling the observed data.
I Covered three classical algorithms:
1. Rejection ABC2. Regression ABC3. Sequential Monte Carlo ABC
I Choice of discrepancy measure between simulated andobserved data
I Recent work of mineI Combining modeling of the discrepancy and optimization to
increase computational efficiencyI Using classification to measure the discrepancy
Michael Gutmann ABC Tutorial 63 / 65
Summary
I The topic was Bayesian inference for models specified via asimulator (implicit / generative models).
I Introduced approximate Bayesian computation (ABC).
I Principle of ABC: Find parameter values which yield simulateddata resembling the observed data.
I Covered three classical algorithms:
1. Rejection ABC2. Regression ABC3. Sequential Monte Carlo ABC
I Choice of discrepancy measure between simulated andobserved data
I Recent work of mineI Combining modeling of the discrepancy and optimization to
increase computational efficiencyI Using classification to measure the discrepancy
Michael Gutmann ABC Tutorial 63 / 65
References
My papers: https://sites.google.com/site/michaelgutmann/publications
I Numminen et al. Estimating the Transmission Dynamics of Streptococcuspneumoniae from Strain Prevalence Data. Biometrics, 2013.
I Beaumont et al. Approximate Bayesian Computation in Population Genetics.Genetics, 2002.
I Sisson et al. Sequential Monte Carlo without likelihoods. PNAS, 2007.
I Nunes and Balding. On optimal selection of summary statistics for approximateBayesian computation. Statistical Applications in Genetics and MolecularBiology, 2010.
I Fearnhead and Prangle. Constructing summary statistics for approximateBayesian computation: semi-automatic approximate Bayesian computation.Journal of the Royal Statistical Society: Series B (Statistical Methodology),2012.
I Aeschbacher et al. A novel approach for choosing summary statistics inapproximate Bayesian computation. Genetics, 2012.
I Blum et al. A Comparative Review of Dimension Reduction Methods inApproximate Bayesian Computation. Statistical Science, 2013.
Michael Gutmann ABC Tutorial 64 / 65
Review papers
I Beaumont. Approximate Bayesian Computation in Evolutionand Ecology. Annual Review of Ecology, Evolution, andSystematics, 2010.
I Hartig et al. Statistical inference for stochastic simulationmodels – theory and application.Ecology Letters, 2011.
I Marin et al. Approximate Bayesian computational methods.Statistics and Computing, 2012.
I Sunnaker et al. Approximate Bayesian Computation. PLoSComputational Biology, 2013.
Michael Gutmann ABC Tutorial 65 / 65