asymptotic genealogies of sequential monte carlo algorithms · asymptotic genealogies of sequential...

Asymptotic Genealogies ofsequential Monte Carlo algorithms

Jere Koskela, Paul Jenkins1, Adam M. Johansen2, and DarioSpano, all at Department of Statistics, University of Warwick.

arxiv: 1804.01811v2

CIRM — 25.10.2018Compiled: Wednesday 24th October, 2018: 22:05

1Also at Department of Computer Science, University of Warwick2Also at The Alan Turing Institute

CoSInES Launch Day, 2/11/18 at Warwick

.

CoSInES a collaboration of researchers from Warwick, Bristol,Lancaster, Oxford and the Alan Turing Institute totackle fundamental challenges in Computational andBayesian Statistics.

Dates The project will run 1st October 2018 till 30thSeptember 2023, is primarily funded by EPSRC.

Launch We’d like to invite anyone who would like to attendto register their interest by emailing Shital Desai([email protected]).

Jobs! Soon 5 4-year PDRA positions associated with theproject based at any of the 5 institutions involved inthe grant. Informal enquiries to Gareth Roberts([email protected]) are very welcome.

http://www.cosines.org

mailto:[email protected]

http://www.cosines.org

Outline

Sequential Monte Carlo

Path degeneracy

The genealogical process and convergence

A numerical example

Conclusions and outlook

Importance sampling

I Intractable target:

Eπ[f (X)] :=

∫f (x)π(dx).

I Monte Carlo: let X(1:N) ∼ π⊗N . Then

Eπ[f (X)] ≈ 1

N

N∑i=1

f (X(i)).

I Change of measure:

Eπ[f (X)] = Eµ[dπ

dµ(X)f (X)

]=

∫dπ

dµ(x)f (x)µ(dx).

I Importance sampling: let X(1:N) ∼ µ⊗N . Then

Eπ[f (X)] ≈ 1

N

N∑i=1

dπ

dµ(X(i))f (X(i)).

Sequential Monte Carlo / Interacting Particle System

1: for i ∈ {1, . . . ,N} Sample X(i)0 ∼ µ0.

2: for i ∈ {1, . . . ,N} Set

w(i)0 ←

π0(X(i)0 )/µ0(X

(i)0 )∑N

j=1 π0(X(j)0 )/µ0(X

(j)0 )

.

3: for t ∈ {1, . . . ,T} do4: Sample (a

(1)t , . . . , a

(N)t ) ∼ Categorical(w

(1)t−1, . . . ,w

(N)t−1).

5: for i ∈ {1, . . . ,N} Sample X(i)t ∼ µt(·|X(a

(i)t )

t−1 ).6: for i ∈ {1, . . . ,N} Set

w(i)t ←

πt(X(i)t |X

(a(i)t )

t−1 )/µt(X(i)t |X

(a(i)t )

t−1 )∑Nj=1 πt(X

(j)t |X

(a(j)t )

t−1 )/µt(X(j)t |X

(a(j)t )

t−1 ).

Example: Hidden Markov Model

I Let {Xt}t≥0 be a Markov process with transition densityp(x, x′) and initial density π(x).

I Suppose a noisy observation Yt with density g(y|x) is madeof each state Xt .

I SMC algorithms with

π0(x0) ∝ π(x0)g(y0|x0),

πt(xt |x1:(t−1)) ∝ p(xt−1, xt)g(yt |xt)

target P(X0:T ∈ dx0:T |Y0:T = y0:T ).

I E.g. the boostrap particle filter: µ0 = π,µt(xt |x1:(t−1)) = p(xt−1, xt), and wt ∝ g(yt |xt).

Example: Bootstrap Particle Filter (Gordon et al., 1993)

1: for i ∈ {1, . . . ,N} Sample X(i)0 ∼ π.

2: for i ∈ {1, . . . ,N} Set

w(i)0 ←

g(y0|X (i)0 )∑N

j=1 g(y0|X (j)0 )

.

3: for t ∈ {1, . . . ,T} do4: Sample (a

(1)t , . . . , a

(N)t ) ∼ Categorical(w

(1)t−1, . . . ,w

(N)t−1).

5: for i ∈ {1, . . . ,N} Sample X(i)t ∼ p(X

(a(i)t )

t−1 , ·).6: for i ∈ {1, . . . ,N} Set

w(i)t ←

g(yt |X (i)t )∑N

j=1 g(yt |X (j)t )

.

Path degeneracy

I Suppose T � 1, and f (X0:T ) depends on every time point.

I Mergers due to resampling mean that times t � T areestimated from m� N paths.

I ⇒ High variance estimators.

I Loss of paths also means that fewer than N × T particlesneed to be stored, reducing memory cost.

I Aim: a priori estimates of:

E[TMRCA], Var(TMRCA), P(TMRCA > t),

etc.

Reasons to Characterize Path Degeneracy include. . .

I Qualitative understanding of methods.

I Calibrating fixed-lag techniques, e.g. Doucet and Senecal(2004).

I Relationship with estimator variance (Chan et al., 2013; Leeand Whiteley, 2015).

I Understanding storage requirements (Jacob et al., 2015).

The coalescent process (Kingman, 1982)

I Let {R(n)t }t≥0 be a partition-valued process.

I R(n)0 = {{1}, . . . , {n}}.

I Each pair of blocks {i}, {j} merge at rate 1.

I A “death” process of rate(k

2

)where k is the number of

blocks.

Example: n = 4

T1 ∼Exp(6) T2 ∼Exp(3) Exp(1)

T1 T2 T3

The genealogical process

I It is convenient to reverse the direction of time. . .

I Let {G (n,N)t }t∈N0 be the

genealogy of n ≤ N particlessampled randomly from anN-particle SMC algorithm ofinterest.

I G(n,N)0 = {{1}, . . . , {n}}.

I i ∼ j in G(n,N)t ⇒ particles i and

j have a common ancestor tgenerations ago.

I G (2,7) illustrated.

Objective: Establish conditions under which

As N →∞:

T1 T2 T3

Rescaling time

I For i ∈ {1, . . . ,N} and t ∈ N, let ν(i)t be the number of

children of particle i , t generations ago.

I Define

cN(t) :=1

(N)2

N∑i=1

(ν(i)t )2 ≈ E[ESS(t)−1],

τN(t) := inf

{s ≥ 1 :

s∑r=1

cN(r) ≥ t

},

DN(t) :=1

N(N)2

N∑i=1

(ν(i)t )2

(ν

(i)t +

1

N

N∑j 6=i

(ν(j)t )2

).

Convergence theorem

Suppose that all assignments of offspring to parents are equallylikely, and that

limN→∞

E

[τN(t)∑

r=τN(s)+1

DN(r)

]= 0,

limN→∞

E[cN(t)] = 0,

limN→∞

E

[τN(t)∑

r=τN(s)+1

cN(r)2

]= 0,

E[τN(t)− τN(s)] ≤ Ct,sN.

Then (G(n,N)τN(t) )t≥0 converges to the Kingman coalescent in the

sense of finite dimensional distributions.

Proof outline

I Consider finite dimensional distributions.

I Apply straightforward, but intricate counting arguments,

I together with bounds on elementary transition probabilities,

I to upper and lower bound the elements of the FDDs.

I Compare these with those of the coalescent.

Sketch proof

I Let ξ and η be partitions of {1, . . . , n}, with the block countsof η in terms of the blocks of ξ being b1, . . . , b|η|,i.e. b1 + . . .+ b|η| = |ξ|.

I The conditional one-step transition probability of G(n,N)t given

family sizes is

pξη(t) :=1

(N)|ξ|

N∑i1 6=... 6=i|η|=1

(ν(i1)t )b1 . . . (ν

(i|η|)t )b|η| .

I FDDs:

P(G(n,N)τN(t1) = η1, . . . ,G

(n,N)τN(tk ) = ηk |G

(n,N)τN(t0) = η0)

= E

[k∏

d=1

{τN(td )∏

r=τN(td−1)+1

PN(r)

}ηd−1ηd

].

Sketch proof II

I For a single time interval{τN(td )∏

r=τN(td−1)+1

PN(r)

}ηd−1ηd

=∑ξ

τN(td )∏s=τN(td−1)+1

pξs−1ξs (s),

where ξ = (ηd−1, ξτN(td−1)+1, . . . , ξτN(td )−1, ηd).

I Each partition in ξ is either equal to its predecessor, orobtained from its predeceror by merging some subset(s) ofblocks.

Sketch proof III

pξξ(t) ≈ 1−(|ξ|2

)1

(N)2−(|ξ|2

)cN(t).

If η is formed by merging two blocks of ξ,

cN(t)−(|ξ| − 2

2

)DN(t) . pξη(t) . cN(t).

If η is formed by merging more than two blocks of ξ,

pξη(t) .

(|ξ| − 2

2

)DN(t).

Sketch proof IV

∑ξ


pξs−1ξs (s) ≈|ηd−1|−|ηd |∑

α=1

∑(λ,µ)∈Π2([α])

C

×τN(td )∑

s1<...<sα=τN(td−1)+1

{∏r∈λ

cN(sr )

}{∏r∈µ

DN(sr )

},

DN(t) ≈ cN(t)

N,

τN(td )∑s<...<sα=τN(td−1)+1

α∏r=1

cN(sr ) ≈ (td − td−1)α

α!.

Sketch proof V

When ξ consists of binary mergers only, i.e. α = |ηd−1| − |ηd |,

∑ξ


pξs−1ξs (s)

≈ C ′τN(td )∑

s1<...<sα=τN(td−1)+1

{α∏

r=1

cN(sr )

}τN(td )∏

r=τN(td−1)+1

{1− C ′′cN(r)}

≈τN(td )−τN(td−1)−α∑

β=0

C ′′′τN(td )∑

s1<...<sα+β=τN(td−1)+1

α+β∏r=1

cN(sr ).

Sketch proof VI

It turns out that the constant C ′′′ is exactly (Qα+β)ηd−1ηd , whereQ is the Kingman coalescent generator!

∑ξ


pξs−1ξs (s)


β=0

C ′′′τN(td )∑

s1<...<sα+β=τN(td−1)+1

α+β∏r=1

cN(sr )


β=0

(Qα+β)ηd−1ηd

(td − td−1)α+β

(α + β)!

≈ (eQ(td−td−1))ηd−1ηd .

Corollary 1

The genealogy of n particles sampled uniformly at random from anN-particle bootstrap particle filter with multinomial resamplingconverges to a Kingman coalescent under the time-scaling τN(t)whenever

1

a≤ g(yt |xt) ≤ a,

εh(xt) ≤ p(xt−1, xt) ≤1

εh(xt),

for some 0 < ε ≤ 1 ≤ a <∞, and some probability density h(x)uniformly in time, space, and the observations.

Sketch proof

I Conditional on weights, the offspring counts have multinomial

distributions with parameters (N;w(1)t , . . . ,w

(N)t ).

I Upper and lower bounds on observation densities imply

ε2

a2N≤ w

(i)t ≤

a2

ε2N.

I The required upper and lower bounds follow from thesebounds, standard moment-calculations for multinomialdistributions, and the local dependence structure of particlefilters.

Corollary 2

Let Tn be the total Kingman coalescent tree height of n particles.Under the preceding assumptions,

2ε4N

a4

(1− 1

n

)≤ E[τN(Tn)] ≤ 2ε4N

a4

(1− 1

n

)+

a8

ε4,

N2ε8

a8

(4π2

3− 12 + O(n−1)

)≤Var(τN(Tn))

≤N2a8

ε8

(4π2

3− 12 + O(n−1)

)+ O(N).

A numerical example

I Take the earlier HMM to be

Xt+1 = (1−∆)Xt +√

∆ξt

X0 ∼ N(0, 1),

Yt |Xt ∼ N(Xt , σ2),

where ξt ∼ N(0, 1).

I This model violates the assumed lower bounds.

I Nevertheless, simulations using a boostrap particle filter showthat the Kingman scalings hold, even when n = N.

Mean tree height

●

●

●● ● ● ●

0 20 40 60 80 100 120

0.0

0.1

0.2

0.3

0.4

N = 128

n

E[−

T/N

]

+

++ + + + +

x

xx

x x x x

*

** * * * *

o+x

*

multinomialresidualstratifiedsystematic

●

●

●

●●●●● ● ● ● ● ●

0 2000 4000 6000 8000

0.0

0.1

0.2

0.3

0.4

N = 8192

n

E[−

T/N

]

+

+

++++++ + + + + +

x

x

x

xxxxx x x x x x

*

*

*

***** * * * * *

o+x

*


Mean tree height II

●●

● ●

● ● ●

0 2000 4000 6000 8000

0.0

0.1

0.2

0.3

0.4

n = 2

N

E[−

T/N

]

++

++

+ + +

xx x

x xx x

**

*

* * * *

o+x

*


●●

●

0 2000 4000 6000 8000

0.0

0.1

0.2

0.3

0.4

n = 2048

N

E[−

T/N

]

++

+xx

x

* * *

o+x

*


Tree height variance

●

●

● ● ● ● ●

0 20 40 60 80 100 120

0.00

0.02

0.04

0.06

0.08

0.10

N = 128

n

Var

(−T

/N)

+

++ +

+ + +

x

x

xx x x x

*

*

** * * *

o+x

*


●●●●●●●● ● ● ● ● ●

0 2000 4000 6000 8000

0.00

0.02

0.04

0.06

0.08

0.10

N = 8192

n

Var

(−T

/N)

++++++++ + + + + +xxxxxxxx x x x x x

******** * * * * *

o+x

*


Tree height variance II

●

●

●

●● ● ●

0 2000 4000 6000 8000

0.00

0.02

0.04

0.06

0.08

0.10

n = 2

N

Var

(−T

/N)

+

+

++ + + +

x

xx x

x

x x

*

** * * *

*

o+x

*


●

●●

0 2000 4000 6000 8000

0.00

0.02

0.04

0.06

0.08

0.10

n = 2048

N

Var

(−T

/N)

++ +

x

x

x**

*

o+x

*


Conclusions

I Genealogies of n� N particles from N-particle SMCalgorithms converge to the Kingman coalescent when time ismeasured in units of N, as N →∞.

I Strong technical assumptions (i.e. a compact state space)which do not seem necessary in practice.

I Predicted scalings observed in experiments for finite N, seemto hold even when n ≈ N.

I Result holds for multinomial resampling, but other schemesagree with predictions empirically.

I This result also demonstrates that the domain of attraction ofthe Kingman coalescent includes certain non-Markoviangenealogies.

Outlook

I Some areas in which genealogical results might be interesting:I Variance estimation from SMC output (Lee and Whiteley,

2015).I Smoothing and static parameter estimation.I Mixing in particle Gibbs/iterated cSMC.

I Room for improvement (selected topics. . . )I Relaxing strong assumptions.I Incorporating other resampling schemes.I Obtaining stronger modes of convergence.I (Formal analysis of n ≈ N.)I Incorporating conditional SMC.

ReferencesH. P. Chan, T. L. Lai, A general theory of particle filters in hidden

Markov models and some applications. Annals of Statistics, 41(6):2877–2904, 2013.

A. Doucet and Stephane Senecal. Fixed-lag sequential MonteCarlo. In EURASIPCO, 2004.

N. J. Gordon, S. J. Salmond, and A. F. M. Smith. Novel approachto nonlinear/non-Gaussian Bayesian state estimation. IEEProceedings-F, 140(2):107–113, April 1993.

P. E. Jacob, L. Murray, and S. Rubenthaler. Path storage in theparticle filter. Statistics and Computing, 25(2):487–496, 2015.

J. F. C. Kingman. The coalescent. Stochastic Processes and theirApplications, 13(3):235–248, 1982.

J. Koskela, P. Jenkins, A. M. Johansen, and D. Spano. Asymptoticgenealogies of interacting particle systems with an application tosequential Monte Carlo. ArXiv mathematics e-print 1804.01811,ArXiv Mathematics e-prints, April 2018.

A. Lee and N. Whiteley. Variance estimation and allocation in theparticle filter. Technical Report 1509.00394, Arxiv, 2015.

Acknowledgments

I Paul Jenkins was supported in part by EPSRC grantEP/L018497/1.

I Adam Johansen was partially supported by funding from theLloyd’s Register Foundation — Alan Turing InstituteProgramme on Data-Centric Engineering.

I Jere Koskela was supported by EPSRC as part of theMASDOC DTC at the University of Warwick (GrantNo. EP/HO23364/1), and by DeutscheForschungsgemeinschaft (DFG) grant BL 1105/3-2.

asymptotic genealogies of sequential monte carlo algorithms · asymptotic genealogies of sequential...

Documents