asymptotic genealogies of sequential monte carlo algorithms · asymptotic genealogies of sequential...
TRANSCRIPT
Asymptotic Genealogies ofsequential Monte Carlo algorithms
Jere Koskela, Paul Jenkins1, Adam M. Johansen2, and DarioSpano, all at Department of Statistics, University of Warwick.
arxiv: 1804.01811v2
CIRM — 25.10.2018Compiled: Wednesday 24th October, 2018: 22:05
1Also at Department of Computer Science, University of Warwick2Also at The Alan Turing Institute
CoSInES Launch Day, 2/11/18 at Warwick
.
CoSInES a collaboration of researchers from Warwick, Bristol,Lancaster, Oxford and the Alan Turing Institute totackle fundamental challenges in Computational andBayesian Statistics.
Dates The project will run 1st October 2018 till 30thSeptember 2023, is primarily funded by EPSRC.
Launch We’d like to invite anyone who would like to attendto register their interest by emailing Shital Desai([email protected]).
Jobs! Soon 5 4-year PDRA positions associated with theproject based at any of the 5 institutions involved inthe grant. Informal enquiries to Gareth Roberts([email protected]) are very welcome.
http://www.cosines.org
Outline
Sequential Monte Carlo
Path degeneracy
The genealogical process and convergence
A numerical example
Conclusions and outlook
Importance sampling
I Intractable target:
Eπ[f (X)] :=
∫f (x)π(dx).
I Monte Carlo: let X(1:N) ∼ π⊗N . Then
Eπ[f (X)] ≈ 1
N
N∑i=1
f (X(i)).
I Change of measure:
Eπ[f (X)] = Eµ[dπ
dµ(X)f (X)
]=
∫dπ
dµ(x)f (x)µ(dx).
I Importance sampling: let X(1:N) ∼ µ⊗N . Then
Eπ[f (X)] ≈ 1
N
N∑i=1
dπ
dµ(X(i))f (X(i)).
Sequential Monte Carlo / Interacting Particle System
1: for i ∈ {1, . . . ,N} Sample X(i)0 ∼ µ0.
2: for i ∈ {1, . . . ,N} Set
w(i)0 ←
π0(X(i)0 )/µ0(X
(i)0 )∑N
j=1 π0(X(j)0 )/µ0(X
(j)0 )
.
3: for t ∈ {1, . . . ,T} do4: Sample (a
(1)t , . . . , a
(N)t ) ∼ Categorical(w
(1)t−1, . . . ,w
(N)t−1).
5: for i ∈ {1, . . . ,N} Sample X(i)t ∼ µt(·|X(a
(i)t )
t−1 ).6: for i ∈ {1, . . . ,N} Set
w(i)t ←
πt(X(i)t |X
(a(i)t )
t−1 )/µt(X(i)t |X
(a(i)t )
t−1 )∑Nj=1 πt(X
(j)t |X
(a(j)t )
t−1 )/µt(X(j)t |X
(a(j)t )
t−1 ).
Example: Hidden Markov Model
I Let {Xt}t≥0 be a Markov process with transition densityp(x, x′) and initial density π(x).
I Suppose a noisy observation Yt with density g(y|x) is madeof each state Xt .
I SMC algorithms with
π0(x0) ∝ π(x0)g(y0|x0),
πt(xt |x1:(t−1)) ∝ p(xt−1, xt)g(yt |xt)
target P(X0:T ∈ dx0:T |Y0:T = y0:T ).
I E.g. the boostrap particle filter: µ0 = π,µt(xt |x1:(t−1)) = p(xt−1, xt), and wt ∝ g(yt |xt).
Example: Bootstrap Particle Filter (Gordon et al., 1993)
1: for i ∈ {1, . . . ,N} Sample X(i)0 ∼ π.
2: for i ∈ {1, . . . ,N} Set
w(i)0 ←
g(y0|X (i)0 )∑N
j=1 g(y0|X (j)0 )
.
3: for t ∈ {1, . . . ,T} do4: Sample (a
(1)t , . . . , a
(N)t ) ∼ Categorical(w
(1)t−1, . . . ,w
(N)t−1).
5: for i ∈ {1, . . . ,N} Sample X(i)t ∼ p(X
(a(i)t )
t−1 , ·).6: for i ∈ {1, . . . ,N} Set
w(i)t ←
g(yt |X (i)t )∑N
j=1 g(yt |X (j)t )
.
Path degeneracy
I Suppose T � 1, and f (X0:T ) depends on every time point.
I Mergers due to resampling mean that times t � T areestimated from m� N paths.
I ⇒ High variance estimators.
I Loss of paths also means that fewer than N × T particlesneed to be stored, reducing memory cost.
I Aim: a priori estimates of:
E[TMRCA], Var(TMRCA), P(TMRCA > t),
etc.
Reasons to Characterize Path Degeneracy include. . .
I Qualitative understanding of methods.
I Calibrating fixed-lag techniques, e.g. Doucet and Senecal(2004).
I Relationship with estimator variance (Chan et al., 2013; Leeand Whiteley, 2015).
I Understanding storage requirements (Jacob et al., 2015).
The coalescent process (Kingman, 1982)
I Let {R(n)t }t≥0 be a partition-valued process.
I R(n)0 = {{1}, . . . , {n}}.
I Each pair of blocks {i}, {j} merge at rate 1.
I A “death” process of rate(k
2
)where k is the number of
blocks.
Example: n = 4
T1 ∼Exp(6) T2 ∼Exp(3) Exp(1)
T1 T2 T3
The genealogical process
I It is convenient to reverse the direction of time. . .
I Let {G (n,N)t }t∈N0 be the
genealogy of n ≤ N particlessampled randomly from anN-particle SMC algorithm ofinterest.
I G(n,N)0 = {{1}, . . . , {n}}.
I i ∼ j in G(n,N)t ⇒ particles i and
j have a common ancestor tgenerations ago.
I G (2,7) illustrated.
Objective: Establish conditions under which
As N →∞:
T1 T2 T3
Rescaling time
I For i ∈ {1, . . . ,N} and t ∈ N, let ν(i)t be the number of
children of particle i , t generations ago.
I Define
cN(t) :=1
(N)2
N∑i=1
(ν(i)t )2 ≈ E[ESS(t)−1],
τN(t) := inf
{s ≥ 1 :
s∑r=1
cN(r) ≥ t
},
DN(t) :=1
N(N)2
N∑i=1
(ν(i)t )2
(ν
(i)t +
1
N
N∑j 6=i
(ν(j)t )2
).
Convergence theorem
Suppose that all assignments of offspring to parents are equallylikely, and that
limN→∞
E
[τN(t)∑
r=τN(s)+1
DN(r)
]= 0,
limN→∞
E[cN(t)] = 0,
limN→∞
E
[τN(t)∑
r=τN(s)+1
cN(r)2
]= 0,
E[τN(t)− τN(s)] ≤ Ct,sN.
Then (G(n,N)τN(t) )t≥0 converges to the Kingman coalescent in the
sense of finite dimensional distributions.
Proof outline
I Consider finite dimensional distributions.
I Apply straightforward, but intricate counting arguments,
I together with bounds on elementary transition probabilities,
I to upper and lower bound the elements of the FDDs.
I Compare these with those of the coalescent.
Sketch proof
I Let ξ and η be partitions of {1, . . . , n}, with the block countsof η in terms of the blocks of ξ being b1, . . . , b|η|,i.e. b1 + . . .+ b|η| = |ξ|.
I The conditional one-step transition probability of G(n,N)t given
family sizes is
pξη(t) :=1
(N)|ξ|
N∑i1 6=... 6=i|η|=1
(ν(i1)t )b1 . . . (ν
(i|η|)t )b|η| .
I FDDs:
P(G(n,N)τN(t1) = η1, . . . ,G
(n,N)τN(tk ) = ηk |G
(n,N)τN(t0) = η0)
= E
[k∏
d=1
{τN(td )∏
r=τN(td−1)+1
PN(r)
}ηd−1ηd
].
Sketch proof II
I For a single time interval{τN(td )∏
r=τN(td−1)+1
PN(r)
}ηd−1ηd
=∑ξ
τN(td )∏s=τN(td−1)+1
pξs−1ξs (s),
where ξ = (ηd−1, ξτN(td−1)+1, . . . , ξτN(td )−1, ηd).
I Each partition in ξ is either equal to its predecessor, orobtained from its predeceror by merging some subset(s) ofblocks.
Sketch proof III
pξξ(t) ≈ 1−(|ξ|2
)1
(N)2−(|ξ|2
)cN(t).
If η is formed by merging two blocks of ξ,
cN(t)−(|ξ| − 2
2
)DN(t) . pξη(t) . cN(t).
If η is formed by merging more than two blocks of ξ,
pξη(t) .
(|ξ| − 2
2
)DN(t).
Sketch proof IV
∑ξ
τN(td )∏s=τN(td−1)+1
pξs−1ξs (s) ≈|ηd−1|−|ηd |∑
α=1
∑(λ,µ)∈Π2([α])
C
×τN(td )∑
s1<...<sα=τN(td−1)+1
{∏r∈λ
cN(sr )
}{∏r∈µ
DN(sr )
},
DN(t) ≈ cN(t)
N,
τN(td )∑s<...<sα=τN(td−1)+1
α∏r=1
cN(sr ) ≈ (td − td−1)α
α!.
Sketch proof V
When ξ consists of binary mergers only, i.e. α = |ηd−1| − |ηd |,
∑ξ
τN(td )∏s=τN(td−1)+1
pξs−1ξs (s)
≈ C ′τN(td )∑
s1<...<sα=τN(td−1)+1
{α∏
r=1
cN(sr )
}τN(td )∏
r=τN(td−1)+1
{1− C ′′cN(r)}
≈τN(td )−τN(td−1)−α∑
β=0
C ′′′τN(td )∑
s1<...<sα+β=τN(td−1)+1
α+β∏r=1
cN(sr ).
Sketch proof VI
It turns out that the constant C ′′′ is exactly (Qα+β)ηd−1ηd , whereQ is the Kingman coalescent generator!
∑ξ
τN(td )∏s=τN(td−1)+1
pξs−1ξs (s)
≈τN(td )−τN(td−1)−α∑
β=0
C ′′′τN(td )∑
s1<...<sα+β=τN(td−1)+1
α+β∏r=1
cN(sr )
≈τN(td )−τN(td−1)−α∑
β=0
(Qα+β)ηd−1ηd
(td − td−1)α+β
(α + β)!
≈ (eQ(td−td−1))ηd−1ηd .
Corollary 1
The genealogy of n particles sampled uniformly at random from anN-particle bootstrap particle filter with multinomial resamplingconverges to a Kingman coalescent under the time-scaling τN(t)whenever
1
a≤ g(yt |xt) ≤ a,
εh(xt) ≤ p(xt−1, xt) ≤1
εh(xt),
for some 0 < ε ≤ 1 ≤ a <∞, and some probability density h(x)uniformly in time, space, and the observations.
Sketch proof
I Conditional on weights, the offspring counts have multinomial
distributions with parameters (N;w(1)t , . . . ,w
(N)t ).
I Upper and lower bounds on observation densities imply
ε2
a2N≤ w
(i)t ≤
a2
ε2N.
I The required upper and lower bounds follow from thesebounds, standard moment-calculations for multinomialdistributions, and the local dependence structure of particlefilters.
Corollary 2
Let Tn be the total Kingman coalescent tree height of n particles.Under the preceding assumptions,
2ε4N
a4
(1− 1
n
)≤ E[τN(Tn)] ≤ 2ε4N
a4
(1− 1
n
)+
a8
ε4,
N2ε8
a8
(4π2
3− 12 + O(n−1)
)≤Var(τN(Tn))
≤N2a8
ε8
(4π2
3− 12 + O(n−1)
)+ O(N).
A numerical example
I Take the earlier HMM to be
Xt+1 = (1−∆)Xt +√
∆ξt
X0 ∼ N(0, 1),
Yt |Xt ∼ N(Xt , σ2),
where ξt ∼ N(0, 1).
I This model violates the assumed lower bounds.
I Nevertheless, simulations using a boostrap particle filter showthat the Kingman scalings hold, even when n = N.
Mean tree height
●
●
●● ● ● ●
0 20 40 60 80 100 120
0.0
0.1
0.2
0.3
0.4
N = 128
n
E[−
T/N
]
+
++ + + + +
x
xx
x x x x
*
** * * * *
o+x
*
multinomialresidualstratifiedsystematic
●
●
●
●●●●● ● ● ● ● ●
0 2000 4000 6000 8000
0.0
0.1
0.2
0.3
0.4
N = 8192
n
E[−
T/N
]
+
+
++++++ + + + + +
x
x
x
xxxxx x x x x x
*
*
*
***** * * * * *
o+x
*
multinomialresidualstratifiedsystematic
Mean tree height II
●●
● ●
● ● ●
0 2000 4000 6000 8000
0.0
0.1
0.2
0.3
0.4
n = 2
N
E[−
T/N
]
++
++
+ + +
xx x
x xx x
**
*
* * * *
o+x
*
multinomialresidualstratifiedsystematic
●●
●
0 2000 4000 6000 8000
0.0
0.1
0.2
0.3
0.4
n = 2048
N
E[−
T/N
]
++
+xx
x
* * *
o+x
*
multinomialresidualstratifiedsystematic
Tree height variance
●
●
● ● ● ● ●
0 20 40 60 80 100 120
0.00
0.02
0.04
0.06
0.08
0.10
N = 128
n
Var
(−T
/N)
+
++ +
+ + +
x
x
xx x x x
*
*
** * * *
o+x
*
multinomialresidualstratifiedsystematic
●●●●●●●● ● ● ● ● ●
0 2000 4000 6000 8000
0.00
0.02
0.04
0.06
0.08
0.10
N = 8192
n
Var
(−T
/N)
++++++++ + + + + +xxxxxxxx x x x x x
******** * * * * *
o+x
*
multinomialresidualstratifiedsystematic
Tree height variance II
●
●
●
●● ● ●
0 2000 4000 6000 8000
0.00
0.02
0.04
0.06
0.08
0.10
n = 2
N
Var
(−T
/N)
+
+
++ + + +
x
xx x
x
x x
*
** * * *
*
o+x
*
multinomialresidualstratifiedsystematic
●
●●
0 2000 4000 6000 8000
0.00
0.02
0.04
0.06
0.08
0.10
n = 2048
N
Var
(−T
/N)
++ +
x
x
x**
*
o+x
*
multinomialresidualstratifiedsystematic
Conclusions
I Genealogies of n� N particles from N-particle SMCalgorithms converge to the Kingman coalescent when time ismeasured in units of N, as N →∞.
I Strong technical assumptions (i.e. a compact state space)which do not seem necessary in practice.
I Predicted scalings observed in experiments for finite N, seemto hold even when n ≈ N.
I Result holds for multinomial resampling, but other schemesagree with predictions empirically.
I This result also demonstrates that the domain of attraction ofthe Kingman coalescent includes certain non-Markoviangenealogies.
Outlook
I Some areas in which genealogical results might be interesting:I Variance estimation from SMC output (Lee and Whiteley,
2015).I Smoothing and static parameter estimation.I Mixing in particle Gibbs/iterated cSMC.
I Room for improvement (selected topics. . . )I Relaxing strong assumptions.I Incorporating other resampling schemes.I Obtaining stronger modes of convergence.I (Formal analysis of n ≈ N.)I Incorporating conditional SMC.
ReferencesH. P. Chan, T. L. Lai, A general theory of particle filters in hidden
Markov models and some applications. Annals of Statistics, 41(6):2877–2904, 2013.
A. Doucet and Stephane Senecal. Fixed-lag sequential MonteCarlo. In EURASIPCO, 2004.
N. J. Gordon, S. J. Salmond, and A. F. M. Smith. Novel approachto nonlinear/non-Gaussian Bayesian state estimation. IEEProceedings-F, 140(2):107–113, April 1993.
P. E. Jacob, L. Murray, and S. Rubenthaler. Path storage in theparticle filter. Statistics and Computing, 25(2):487–496, 2015.
J. F. C. Kingman. The coalescent. Stochastic Processes and theirApplications, 13(3):235–248, 1982.
J. Koskela, P. Jenkins, A. M. Johansen, and D. Spano. Asymptoticgenealogies of interacting particle systems with an application tosequential Monte Carlo. ArXiv mathematics e-print 1804.01811,ArXiv Mathematics e-prints, April 2018.
A. Lee and N. Whiteley. Variance estimation and allocation in theparticle filter. Technical Report 1509.00394, Arxiv, 2015.
Acknowledgments
I Paul Jenkins was supported in part by EPSRC grantEP/L018497/1.
I Adam Johansen was partially supported by funding from theLloyd’s Register Foundation — Alan Turing InstituteProgramme on Data-Centric Engineering.
I Jere Koskela was supported by EPSRC as part of theMASDOC DTC at the University of Warwick (GrantNo. EP/HO23364/1), and by DeutscheForschungsgemeinschaft (DFG) grant BL 1105/3-2.