an adaptive smc scheme for approximate bayesian ...sayan/fernando.pdf · data-simulation steps in...
TRANSCRIPT
An adaptive SMC scheme for ApproximateBayesian Computation (ABC)
Fernando Bonassi
(joint work with Prof. Mike West)
Department of Statistical Science - Duke University
April/2011
Fernando Bonassi An adaptive SMC scheme for ABC
Approximate Bayesian Computation (ABC)
Problems in which likelihood is intractable but we can simulatethe underlying stochastic model
So-called implicit statistical models
Allow great flexibility to model complex systems
Applications in evolutionary biology, epidemiology, systemsbiology.
Fernando Bonassi An adaptive SMC scheme for ABC
ABC Algorithm
1 Draw � from prior �(�)
2 Simulate x ∼ f (x ∣�)
3 Accept � if �(x , xobs) < �
The resulting distribution is �(�∣�(x , xobs) < �)
Exact posterior when � = 0
Fernando Bonassi An adaptive SMC scheme for ABC
ABC Illustration
Fernando Bonassi An adaptive SMC scheme for ABC
Approximate Bayesian Computation (ABC)
Accuracy of the approximation controlled by the tolerance level �
Ideally, � should be very small, but that implies low acceptancerate
Two kinds of methods proposed to improve the efficiency:automatic and post-sampling
Fernando Bonassi An adaptive SMC scheme for ABC
Automatice.g., ABC-MCMC, ABC-SMC
Inputs before simulation steps
Post-sampling
e.g., ABC-REG, ABC-GLM
Analysis after simulation steps
Automatic methods rely on more efficient schemes to samplefrom �(�∣�(x , xobs) < �)
Post-sampling methods are based on some sort of regression tocorrect sampled values and approximate �(�∣xobs)
Fernando Bonassi An adaptive SMC scheme for ABC
A post-sampling approach: ABC-MIX
Marginal data in bionetwork models: toggle switch model(Bonassi, You and West, 2011)
Model
yu = uT + �+ ���u/u T
dudt = �u
(1+vt�u )− (�u + �uut) + �u�u,t
dvdt = �v
(1+ut�v )− (�v + �v vt) + �v�v,t
Independent noise processes �., �.
Observation
Fernando Bonassi An adaptive SMC scheme for ABC
ABC-MIX for the toggle switch model
Massive prior:model simulation⇒ large sample of (�,y)
Data characterization and dimension reduction by means ofsignatures S(y) over a set of reference distributions
Constrain the sample {�,S} keeping the 5% closest syntheticdatasets to Sobs
Fernando Bonassi An adaptive SMC scheme for ABC
Fernando Bonassi An adaptive SMC scheme for ABC
ABC-MIX for the toggle switch model
Fit mixture model to the constrained sample {�,S} (Suchard etal. 2010, Cron and West, 2011)
Conditional mixture g(�∣Sobs) yields approximate posteriordistribution
Fernando Bonassi An adaptive SMC scheme for ABC
Fernando Bonassi An adaptive SMC scheme for ABC
ABC-SMC
Automatic ABC approach based on Sequential Monte Carlo
Main goal: improve the acceptance rate of ABC by dividing theproblem into subproblems (Sisson et al, 2007, Beaumont et al,2009)
In each step t obtain �(�∣�(x , xobs) < �t) for a decreasingtolerance schedule {�1, ⋅ ⋅ ⋅ , �T }
Fernando Bonassi An adaptive SMC scheme for ABC
ABC-SMC Algorithm
S1 Initialize �1 > ⋅ ⋅ ⋅ > �T
S2 t = 1Simulate �(1)i ∼ �(�) and x ∼ f (x ∣�(1)i ) until �(x , xobs) < �1
Set wi = 1/N
S3 t = 2, . . . ,TPick �∗i from the �(t−1)
j ’s with probabilities w (t−1)j
Generate �(t)i ∼ Kt(�(t)i ∣�
∗i ) and x ∼ f (x ∣�(t)i ) until �(x , xobs) < �t
Set w (t)i ∝
�(�(t)i )∑
j w (t−1)j Kt(�
(t)i ∣�
(t−1)j )
Fernando Bonassi An adaptive SMC scheme for ABC
Toy Example
� ∼ Unif (−10,10)
Likelihood: f (x ∣�) = 0.5 N(�,1) + 0.5 N(�,1/100)
Goal: Approximate posterior of � for xobs = 0;
Fernando Bonassi An adaptive SMC scheme for ABC
ABC SMC for tolerance schedule: �1 = 5, �2 = 1, �3 = 0.01
�(�∣�(x , xobs) < �t) where � is the euclidean distance:
Fernando Bonassi An adaptive SMC scheme for ABC
For N=5,000 particles, number of data-generation simulations (in103) in each step
Step t �t ABC-SMC ABC1 5 10 -2 1 26 -3 0.01 734 4,424
Total 770 4,424
The most expensive computational step is generally the modelsimulation
Beaumont et al. (2009) report 95% of time spent in modelsimulation for their application of ABC-SMC
Fernando Bonassi An adaptive SMC scheme for ABC
Idea behind ABC-SMC
∑j w (t−1)
j Kt(�(t)i ∣�
(t−1)j ) can be seen as a mixture approximation
for �(�∣�(x , xobs) < �t−1)
This approximation is then used as a proposal for�(�∣�(x , xobs) < �t) in order to achieve a better approximation
In some sense, it follows the same ideas of adaptive importancesampling of West (1993)
Fernando Bonassi An adaptive SMC scheme for ABC
An adaptive SMC scheme for ABC
Extending the mixture approximation idea, we can approximate�(x , �∣�(x , xobs) < �t−1) by:
g(x , �) ∼∑
j
w (t−1)j Kt,x(x
(t)i ∣x
(t−1)j )Kt,�(�
(t)i ∣�
(t−1)j )
This is a more complete representation of the joint distribution of(x , �), which should induce better proposals and better efficiency
Fernando Bonassi An adaptive SMC scheme for ABC
An adaptive SMC scheme for ABC
The new induced approximation will be:
g(�∣xobs) ∝∑
j
Kt,x(xobs∣x (t−1)j )w (t−1)
j Kt,�(�(t)i ∣�
(t−1)j )
Whereas in the ABC-SMC it was:
g(�∣xobs) ∝∑
j
w (t−1)j Kt,�(�
(t)i ∣�
(t−1)j )
Fernando Bonassi An adaptive SMC scheme for ABC
Mixture approximation at step one (�1 = 5) using ABC-SMC (blue)and ABC-SMC with adaptive weights (red)
Fernando Bonassi An adaptive SMC scheme for ABC
ABC-SMC with Adaptive Weights
S2 t = 1Simulate �(1)i ∼ �(�) and x ∼ f (x ∣�(1)i ) until �(x , xobs) < �1
Set wi = 1/N
S3 t = 2, . . . ,TSet weights v (t−1)
i ∝ w (t−1)i Kt,x(xobs∣x (t−1)
j )
Normalize new weights v (t−1)i
Pick �∗i from the �(t−1)j ’s with probabilities v (t−1)
j
Generate �(t)i ∼ Kt,�(�(t)i ∣�
∗i ) and x ∼ f (x ∣�(t)i ) until �(x , xobs) < �t
Set w (t)i ∝
�(�(t)i )∑
j v (t−1)j Kt,�(�
(t)i ∣�
(t−1)j )
Fernando Bonassi An adaptive SMC scheme for ABC
Comparison for the Normal toy example
For N=5,000 particles, number of data-generation simulations (in 103)in each step
Step t �t ABC-SMC ABC-SMC with AW1 5 10 102 1 26 133 0.01 734 463
Total 770 486
Fernando Bonassi An adaptive SMC scheme for ABC
ABC-SMC with AW for the Toggle Switch Problem
Observation
Fernando Bonassi An adaptive SMC scheme for ABC
ABC-SMC with AW for the Toggle Switch Problem
Each step: 10K of model simulation steps and selection of 10% closestdatasets. Resulting tolerance schedule:
�1:5 = (4.4, 3.5, 0.8, 0.4, 0.2)
This way, the total number of data-generation steps was 50K.
For the previous analysis, ABC-MIX, some distance quantiles (in 10−5) for asample of 200K:
q(10%) = 3.45 q(5%) = 2.95 q(1%) = 0.77
Fernando Bonassi An adaptive SMC scheme for ABC
ABC-MIX:
ABC-SMC with AW:
Fernando Bonassi An adaptive SMC scheme for ABC
Fernando Bonassi An adaptive SMC scheme for ABC
Data-simulation steps in ABC-MIX: 200K.Data-simulation steps in ABC-SMC with AW: 50K
For the regular ABC-SMC, with the same tolerance schedule,the number of generation steps was:
(10K, 10K, 23K, 23K, 27K, 35K)Total: 128K
Fernando Bonassi An adaptive SMC scheme for ABC
ABC-SMC with AW resulted in final effective sample size of 456
Generation steps in ABC-SMC with AW depend on the particularreal dataset. Then, the algorithm should be run separately foreach one of the 10 real datasets
In ABC-MIX, all 200K generations are the same used for everyreal dataset
Fernando Bonassi An adaptive SMC scheme for ABC
Another application (from Toni et al. (2009))
Common-cold outbreak in the island Tristan da Cunha (1967)
day 1 2 3 4 ⋅ ⋅ ⋅ 20 21I(t) 1 2 3 7 ⋅ ⋅ ⋅ 1 0R(t) 0 0 0 0 ⋅ ⋅ ⋅ 36 37
Fernando Bonassi An adaptive SMC scheme for ABC
SIR Model: Susceptible (S), Infected (I) and Recovered (R)
For this case S is unobserved.
SIR Model
Differential equations:
∂S = − SI
∂I = SI − �I
∂R = �I
Prior specification:
∼ U(0, 3),
� ∼ U(0, 3),
S(0) ∼ Unif{37, ⋅ ⋅ ⋅ , 100}
Runge-Kutta method to approximate solution for ODE. � used was theeuclidean distance based on the observed time-points.
Fernando Bonassi An adaptive SMC scheme for ABC
Using ABC-SMC with the tolerance schedule: �1:4 = (100, 70, 40, 20)
For N=1,000 particles, number of data generations in each step (in 103):
ABC-SMC: Step 1: 29 Step 2: 49 Step 3: 706 Step 4: 63
ABC-SMC with AW: Step 1: 29 Step 2: 40 Step 3: 116 Step 4: 11
Fernando Bonassi An adaptive SMC scheme for ABC
Summary
ABC methods: interesting tool to model problems described by complexsystems
Improvement of efficiency can be obtained by automatic andpost-sampling methods
As an application and illustration of post-sampling approach, ToggleSwitch model was studied using ABC-MIX
New extension of ABC-SMC was proposed, which is based on adaptiveweights. It presented better efficiency than regular ABC-SMC
Choice of the most advantageous ABC approach is still aproblem-specific question
Fernando Bonassi An adaptive SMC scheme for ABC
M.A. Beaumont, J.M. Cornuet, J.M. Marin, and C.P. Robert, Adaptive approximatebayesian computation, Biometrika 96 (2009), no. 4, 983.
M.A. Beaumont, W. Zhang, and D.J. Balding, Approximate bayesian computation inpopulation genetics, Genetics 162 (2002), no. 4, 2025.
F.V. Bonassi, L.You, and M. West, Bayesian learning from marginal data in bionetworkmodels, Department of Statistical Science, Duke University: Discussion Paper 11-07(2011).
S. A. Sisson, Y. Fan, and M. M. Tanaka, Sequential Monte Carlo without likelihoods,Proceedings of the National Academy of Sciences USA 104 (2007), 1760–1765.
T. Toni and M. P. H. Stumpf, Simulation-based model selection for dynamical systemsin systems and population biology, Bioinformatics 26 (2010), 104–110.
T. Toni, D. Welch, N. Strelkowa, A. Ipsen, and M.P.H. Stumpf, Approximate Bayesiancomputation scheme for parameter inference and model selection in dynamicalsystems, Journal of the Royal Society Interface 6 (2009), no. 31, 187.
M. West, Approximating posterior distributions by mixtures, Journal of the RoyalStatistical Society (Ser. B) 54 (1993), 553–568.
Fernando Bonassi An adaptive SMC scheme for ABC