sequential monte carlo - columbia universityfwood/tutorials/sequential_monte_carlo.pdf · hidden...
TRANSCRIPT
![Page 1: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/1.jpg)
Sequential Monte Carlo and
Particle Filtering
Frank Wood
Gatsby, November 2007
![Page 2: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/2.jpg)
Importance Sampling
• Recall:– Let’s say that we want to compute some expectation (integral)
and we remember from Monte Carlo integration theory that with samples from p() we can approximate this integral thusly
Ep [f ] =
∫p(x)f(x)dx
Ep[f ] ≈1
L
L∑
ℓ=1
f(xℓ)
![Page 3: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/3.jpg)
What if p() is hard to sample from?
• One solution: use importance sampling
– Choose another easier to sample distribution
q() that is similar to p() and do the following:
Ep [f ] =
∫p(x)f(x)dx
=
∫p(x)
q(x)f(x)q(x)dx
≈1
L
L∑
ℓ=1
p(xℓ)
q(xℓ)f(xℓ) xℓ ∼ q(·)
![Page 4: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/4.jpg)
I.S. with distributions known only to
a proportionality
• Importance sampling using distributions only
known up to a proportionality is easy and
common, algebra yields
Ep [f ] ≈Zq
Zp
1
L
L∑
ℓ=1
p(xℓ)
q(xℓ)f(xℓ)
≈
L∑
ℓ=1
wℓf(xℓ)
rℓ =p(xℓ)q(xℓ)
wℓ =rℓ∑L
ℓ=1rℓ
![Page 5: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/5.jpg)
A model requiring sampling
techniques
• Non-linear non-Gaussian first order Markov model
xt−1 \\xt xt+1
yt−1 yt yt+1
Hidden and of
interest
p(x1:i,y1:i) =∏Ni=1 p(yi|xi)p(xi|xi−1)
![Page 6: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/6.jpg)
Filtering distribution hard to obtain
• Often the filtering distribution is of interest
• It may not be possible to compute these integrals
analytically, be easy to sample from this directly, nor even to design a good proposal distribution for
importance sampling.
p(xi|y1:i−1) ∝∫. . .∫p(x1:i,y1:i)dx1 . . . dxi−1
![Page 7: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/7.jpg)
A solution: sequential Monte Carlo
• Sample from sequence of distributions that “converge” to the distribution of interest
• This is a very general technique that can be applied to a very large number of models and in a wide variety of settings.
• Today: particle filtering for a first order Markov model
![Page 8: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/8.jpg)
Concrete example: target tracking
• A ballistic projectile has been launched in our direction.
• We have orders to intercept the projectile with a missile and thus need to infer the projectiles current position given noisy measurements.
![Page 9: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/9.jpg)
Problem Schematic
rt
θt
(0,0)
(xt, yt)
![Page 10: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/10.jpg)
Probabilistic approach
• Treat true trajectory as a sequence of latent random variables
• Specify a model and do inference to recover the position of the projectile at time t
![Page 11: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/11.jpg)
First order Markov model
xt−1 xt xt+1
yt−1 yt yt+1
Hidden true
trajectory
Noisy
measurements
p(xt+1|xt) = a(·; θa, xt)
p(yt|xt) = h(·; θh, xt)
![Page 12: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/12.jpg)
• We want filtering distribution samples
so that we can compute expectations
Posterior expectations
p(xi|y1:i) ∝ p(yi|xi)
∫p(xi|xi−1)p(xi−1|y1:i−1)dxi−1
∝ p(yi|xi)p(xi|y1:i−1)
Ep[f ] =
∫f(xi)p(xi|y1:i)dxi
Write this down!
![Page 13: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/13.jpg)
Importance sampling
• By identity the posterior predictive distribution can be written as
p(xi) ∝ p(yi|xi)p(xi|y1:i−1) q(xi) = p(xi|y1:i−1)
Proposal distribution Distribution from which we want
samples
q(xi) = p(xi|y1:i−1) =
∫p(xi|xi−1)p(xi−1|y1:i−1)dxi−1
![Page 14: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/14.jpg)
Basis of sequential recursion
• If we start with samples from
then we can write the proposal distribution as a
finite mixture model
and draw samples accordingly
q(xi) = p(xi|y1:i−1) ≈L∑
ℓ=1
wi−1ℓ p(xi|xi−1ℓ )
{wim, xim}Mm=1 ∼ q(·)
{wi−1ℓ , xi−1ℓ }Lℓ=1 ∼ p(xi−1|y1:i−1)
![Page 15: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/15.jpg)
Samples from the proposal
distribution
• We now have samples from the proposal
• And if we recall
{wim, xim} ∼ q(xi)
p(xi) = p(yi|xi)p(xi|y1:i−1) q(xi) = p(xi|y1:i−1)
Proposal distribution Distribution from which we want
samples
![Page 16: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/16.jpg)
Updating the weights completes
importance sampling
• We are left with M weighted samples from the
posterior up to observation i
rim =p(xim)q(xim)
= p(yi|xim)w
im
wim =rim∑M
m=1rim
{wim, xim}Mm=1 ∼ p(xi|y1:i)
![Page 17: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/17.jpg)
Intuition
• Particle filter name comes from physical interpretation of samples
![Page 18: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/18.jpg)
Start with samples representing the
hidden state
(0,0)
{wi−1ℓ , xi−1ℓ }Lℓ=1 ∼ p(xi−1|y1:i−1)
![Page 19: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/19.jpg)
Evolve them according to the state model
(0,0)
wim ∝ wi−1m
xim ∼ p(·|xi−1ℓ )
![Page 20: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/20.jpg)
Re-weight them by the likelihood
(0,0)
wim ∝ wimp(yi|xim)
![Page 21: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/21.jpg)
Results in samples one step
forward
(0,0)
{wiℓ, xiℓ}Lℓ=1 ≈ {w
im, xim}
Mm=1
![Page 22: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/22.jpg)
SIS Particle Filter
• The Sequential Importance Sampler (SIS) particle filter multiplicatively updates weights at every iteration and thus often most weights get very small
• Computationally this makes little sense as eventually low-weighted particles do not contribute to any expectations.
• A measure of particle degeneracy is “effective sample size”
• When this quantity is small, particle degeneracy is severe.
Neff =1∑
L
ℓ=1(wi
ℓ)2
![Page 23: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/23.jpg)
Solutions
• Sequential Importance Re-sampling (SIR) particle filter avoids many of the problems associated with SIS pf’ing by re-sampling the posterior particle set to increase the effective sample size.
• Choosing the best possible importance density is also important because it minimizes the variance of the weights which maximizes Neff
![Page 24: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/24.jpg)
Other tricks to improve pf’ing
• Integrate out everything you can
• Replicate each particle some number of times
• In discrete systems, enumerate instead of sample
• Use fancy re-sampling schemes like stratified sampling, etc.
![Page 25: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/25.jpg)
Initial particles
(0,0)
{wi−1ℓ , xi−1ℓ }Lℓ=1 ∼ p(xi−1|y1:i−1)
![Page 26: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/26.jpg)
Particle evolution step
(0,0)
{wim, xim}Mm=1 ∼ p(xi|y1:i−1)
![Page 27: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/27.jpg)
Weighting step
(0,0)
{wim, xim}Mm=1 ∼ p(xi|y1:i)
![Page 28: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/28.jpg)
Resampling step
(0,0)
{wiℓ, xiℓ}Lℓ=1 ∼ p(xi|y1:i)
![Page 29: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/29.jpg)
Wrap-up: Pros vs. Cons
• Pros:
– Sometimes it is easier to build a “good”
particle filter sampler than an MCMC sampler
– No need to specify a convergence measure
• Cons:
– Really filtering not smoothing
• Issues
– Computational trade-off with MCMC
![Page 30: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/30.jpg)
Thank You
![Page 31: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/31.jpg)
![Page 32: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/32.jpg)
![Page 33: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/33.jpg)
Tricks and Variants
• Reduce the dimensionality of the integrand through analytic integration– Rao-Blackwellization
• Reduce the variance of the Monte Carlo estimator through– Maintaining a weighted particle set
– Stratified sampling
– Over-sampling
– Optimal re-sampling
![Page 34: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/34.jpg)
Particle filtering
• Consists of two basic elements:
– Monte Carlo integration
– Importance sampling
limL→∞
L∑
ℓ=1
wℓf(xℓ) =
∫f(x)p(x)dx
p(x) ≈L∑
ℓ=1
wℓδxℓ
![Page 35: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/35.jpg)
PF’ing: Forming the posterior predictive
The proposal distribution for
importance sampling of the posterior up to observation i
is this approximate posterior
predictive distribution
p(xi|y1:i−1) =
∫p(xi|xi−1)p(xi−1|y1:i−1)dxi−1
≈
L∑
ℓ=1
wi−1ℓ p(xi|xi−1ℓ )
{wi−1ℓ , xi−1ℓ }Lℓ=1 ∼ p(xi−1|y1:i−1)Posterior up to observation
i− 1
![Page 36: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/36.jpg)
Sampling the posterior predictive
• Generating samples from the posterior predictive distribution is the first place where we can introduce variance reduction techniques
• For instance sample from each mixture component several twice such that M, the number of samples drawn, is two times L, the number of densities in the mixture model, and assign weights
{wim, xim}Mm=1 ∼ p(xi|y1:i−1), p(xi|y1:i−1) ≈
L∑
ℓ=1
wi−1ℓ p(xi|xi−1ℓ )
wim =wi−1ℓ
2
![Page 37: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/37.jpg)
Not the best
• Most efficient Monte Carlo estimator of a function Γ(x)
– From survey sampling theory: Neyman
allocation
– Number drawn from each mixture density is
proportional to the weight of the mixture density times the std. dev. of the function Γ
over the mixture density
• Take home: smarter sampling possible
[Cochran 1963]
![Page 38: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/38.jpg)
Over-sampling from the posterior
predictive distribution
(0,0)
{wim, xim}Mm=1 ∼ p(xi|y1:i−1)
![Page 39: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/39.jpg)
• Recall that we want samples from
• and make the following importance sampling identifications
Importance sampling the posterior
p(xi|y1:i) ∝ p(yi|xi)
∫p(xi|xi−1)p(xi−1|y1:i−1)dxi−1
∝ p(yi|xi)p(xi|y1:i−1)
p(xi) = p(yi|xi)p(xi|y1:i−1) q(xi) = p(xi|y1:i−1)
≈L∑
ℓ=1
wi−1ℓ p(xi|xi−1ℓ )
Proposal distribution
Distribution from which we
want to sample
![Page 40: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/40.jpg)
Sequential importance sampling
• Weighted posterior samples arise as
• Normalization of the weights takes place as before
• We are left with M weighted samples from the posterior up to observation i
rim =p(xim)q(xim)
= p(yi|xim)w
im
{wim, xim} ∼ q(·)
wim =riℓ∑L
ℓ=1riℓ
{wim, xim}Mm=1 ∼ p(xi|y1:i)
![Page 41: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/41.jpg)
An alternative view
{wim, xim} ∼L∑
ℓ=1
wi−1ℓ p(xi|xi−1ℓ )
p(xi|y1:i−1) ≈∑M
m=1 wimδxim
p(xi|y1:i) ≈∑M
m=1 p(yi|xim)w
imδxim
![Page 42: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/42.jpg)
Importance sampling from the
posterior distribution
(0,0)
{wim, xim}Mm=1 ∼ p(xi|y1:i)
![Page 43: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/43.jpg)
Sequential importance re-sampling
• Down-sample L particles and weights from the collection of M particles and weights
this can be done via multinomial sampling or in a way
that provably minimizes estimator variance
{wiℓ, xiℓ}Lℓ=1 ≈ {w
im, xim}
Mm=1
[Fearnhead 04]
![Page 44: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/44.jpg)
Down-sampling the particle set
(0,0)
{wiℓ, xiℓ}Lℓ=1 ∼ p(xi|y1:i)
![Page 45: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/45.jpg)
Recap
• Starting with (weighted) samples from the posterior up to observation i-1
• Monte Carlo integration was used to form a mixture model representation of the posterior predictive distribution
• The posterior predictive distribution was used as a proposal distribution for importance sampling of the posterior up to observation i
• M > L samples were drawn and re-weighted according to the likelihood (the importance weight), then the collection of particles was down-sampled to L weighted samples
![Page 46: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/46.jpg)
LSSM Not alone
• Various other models are amenable to sequential inference, Dirichlet process mixture modelling is another example, dynamic Bayes’ nets are another
![Page 47: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/47.jpg)
Rao-Blackwellization
• In models where parameters can be analytically marginalized out, or the particle state space can otherwise be collapsed, the efficiency of the particle filter can be improved by doing so
![Page 48: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/48.jpg)
Stratified Sampling
• Sampling from a mixture density using the algorithm on the right produces a more efficient Monte Carlo estimator
• for n=1:K– choose fn– sample xn from fn– set wn equal to πn
• for n=1:K– choose k according to πk
– sample xn from fk– set wn equal to 1/N
{wn, xn}Kn=1 ∼
∑
k
πkfk(·)
![Page 49: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/49.jpg)
Intuition: weighted particle set
• What is the difference between these two discrete distributions over the set {a,b,c}?
– (a), (a), (b), (b), (c)
– (.4, a), (.4, b), (.2, c)
• Weighted particle representations are equally or more efficient for the same number of particles
![Page 50: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/50.jpg)
Optimal particle re-weighting
• Next step: when down-sampling, pass all particles above a threshold c through without modifying their weights where c is the unique solution to
• Resample all particles with weights below c using stratified sampling and give the selected particles weight 1/ c
∑Mm=1min{cw
im, 1} = L
Fearnhead 2004
![Page 51: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/51.jpg)
Result is provably optimal
• In the down-sampling step
• Imagine instead a “sparse” set of weights of which some are zero
• Then this down-sampling algorithm is optimal w.r.t.
∑Mm=1Ew[(w
im − wim)
2]
{wiℓ, xiℓ}Lℓ=1 ≈ {w
im, xim}
Mm=1
{wiℓ, xiℓ}Mℓ=1 ≈ {w
im, xim}
Mm=1
Fearnhead 2004
![Page 52: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/52.jpg)
Problem Details
• Time and position are given in seconds and meters respectively
• Initial launch velocity and position are both unknown
• The maximum muzzle velocity of the projectile is 1000m/s
• The measurement error in the Cartesian coordinate system is N(0,10000) and N(0,500) for x and y position respectively
• The measurement error in the polar coordinate system is N(0,.001) for θ and Gamma(1,100) for r
• The kill radius of the projectile is 100m
![Page 53: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/53.jpg)
Data and Support Code
http://www.gatsby.ucl.ac.uk/~fwood/pf_tutorial/
![Page 54: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/54.jpg)
Laws of Motion
• In case you’ve forgotten:
• where v is the initial speed and α is the initial angle
r = (v0cos(α))ti+ ((v0sin(α)t−12gt
2)j
![Page 55: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/55.jpg)
Good luck!
![Page 56: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/56.jpg)
Monte Carlo Integration
• Compute integrals for which analytical solutions are unknown
∫f(x)p(x)dx
p(x) ≈L∑
ℓ=1
wℓδxℓ
![Page 57: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/57.jpg)
Monte Carlo Integration
• Integral approximated as the weighted sum of function evaluations at L points
∫f(x)p(x)dx ≈
∫f(x)
L∑
ℓ=1
wℓδxℓdx
=
L∑
ℓ=1
wℓf(xℓ)
∫f(x)p(x)dx
p(x) ≈L∑
ℓ=1
wℓδxℓ
![Page 58: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/58.jpg)
Sampling
• To use MC integration one must be able to sample from p(x)
{wℓ, xℓ}Lℓ=1 ∼ p(·)
limL→∞
L∑
ℓ=1
wℓδxℓ → p(·)
![Page 59: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/59.jpg)
Theory (Convergence)
• Quality of the approximation independent of the dimensionality of the integrand
• Convergence of integral result to the “truth” is O(1/n1/2) from L.L.N.’s.
• Error is independent of dimensionality of x
![Page 60: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/60.jpg)
Bayesian Modelling
• Formal framework for the expression of modelling assumptions
• Two common definitions:
– using Bayes’ rule
– marginalizing over modelsPrior
EvidenceLikelihood
Posterior
p(θ|x) =p(x|θ)p(θ)
p(x)=
p(x|θ)p(θ)∫p(x|θ)p(θ)dθ
![Page 61: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/61.jpg)
Posterior Estimation
• Often the distribution over latent random variables (parameters) is of interest
• Sometimes this is easy (conjugacy)
• Usually it is hard because computing the evidence is intractable
![Page 62: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/62.jpg)
Conjugacy Example
p(θ) =Γ(α+ β)
Γ(α)Γ(β)θα−1(1− θ)β−1
p(x|θ) =
(N
x
)θx(1− θ)N−x
θ ∼ Beta(α, β)
x|θ ∼ Binomial(N, θ)
x successes in N trials, θ probability of success
![Page 63: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/63.jpg)
Conjugacy Continued
p(θ|x) =p(x|θ)p(θ)∫p(x|θ)p(θ)dθ
=1
Z(x)p(x|θ)p(θ)
=1
Z(x)θα−1+x(1− θ)β−1+N−x
θ|x ∼ Beta(α+ x, β +N − x)
Z(x) =
(Γ(α+ β +N)
Γ(α+ x)Γ(β +N − x)
)−1
![Page 64: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/64.jpg)
Non-Conjugate Models
• Easy to come up with examples
σ2 ∼ N(0, α)
x|σ2 ∼ N(0, σ2)
![Page 65: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/65.jpg)
Posterior Inference
• Posterior averages are frequently important in problem domains
– posterior predictive distribution
– evidence (as seen) for model comparison,
etc.
p(xi+1|x1:i) =∫p(xi+1|θ,x1:i)p(θ|x1:i)dθ
![Page 66: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/66.jpg)
p(xi|y1:i) =
∫p(xi,x1:i−1|y1:i)dx1:i−1
∝
∫p(yi|xi,x1:i−1,y1:i−1)p(xi,x1:i−1,y1:i−1)dx1:i−1
∝ p(yi|xi)
∫p(xi|x1:i−1,y1:i)p(x1:i−1|y1:i)dx1:i−1
∝ p(yi|xi)
∫p(xi|xi−1)p(x1:i−1|y1:i−1)dx1:i−1
∝ p(yi|xi)
∫p(xi|xi−1)p(xi−1|y1:i−1)dxi−1
∝ p(yi|xi)p(xi|y1:i−1)
Relating the Posterior to the
Posterior Predictive
![Page 67: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/67.jpg)
p(xi|y1:i) =
∫p(xi,x1:i−1|y1:i)dx1:i−1
∝
∫p(yi|xi,x1:i−1,y1:i−1)p(xi,x1:i−1,y1:i−1)dx1:i−1
∝ p(yi|xi)
∫p(xi|x1:i−1,y1:i)p(x1:i−1|y1:i)dx1:i−1
∝ p(yi|xi)
∫p(xi|xi−1)p(x1:i−1|y1:i−1)dx1:i−1
∝ p(yi|xi)
∫p(xi|xi−1)p(xi−1|y1:i−1)dxi−1
∝ p(yi|xi)p(xi|y1:i−1)
Relating the Posterior to the
Posterior Predictive
![Page 68: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/68.jpg)
p(xi|y1:i) =
∫p(xi,x1:i−1|y1:i)dx1:i−1
∝
∫p(yi|xi,x1:i−1,y1:i−1)p(xi,x1:i−1,y1:i−1)dx1:i−1
∝ p(yi|xi)
∫p(xi|x1:i−1,y1:i)p(x1:i−1|y1:i)dx1:i−1
∝ p(yi|xi)
∫p(xi|xi−1)p(x1:i−1|y1:i−1)dx1:i−1
∝ p(yi|xi)
∫p(xi|xi−1)p(xi−1|y1:i−1)dxi−1
∝ p(yi|xi)p(xi|y1:i−1)
Relating the Posterior to the
Posterior Predictive
![Page 69: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/69.jpg)
p(xi|y1:i) =
∫p(xi,x1:i−1|y1:i)dx1:i−1
∝
∫p(yi|xi,x1:i−1,y1:i−1)p(xi,x1:i−1,y1:i−1)dx1:i−1
∝ p(yi|xi)
∫p(xi|x1:i−1,y1:i)p(x1:i−1|y1:i)dx1:i−1
∝ p(yi|xi)
∫p(xi|xi−1)p(x1:i−1|y1:i−1)dx1:i−1
∝ p(yi|xi)
∫p(xi|xi−1)p(xi−1|y1:i−1)dxi−1
∝ p(yi|xi)p(xi|y1:i−1)
Relating the Posterior to the
Posterior Predictive
![Page 70: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/70.jpg)
Importance sampling
Proposal
distribution:
easy to
sample from
Original
distribution:
hard to
sample from,
easy to
evaluate
Ex [f(x)] =
∫p(x)f(x)dx
=
∫p(x)
q(x)f(x)q(x)dx
≈1
L
L∑
ℓ=1
p(xℓ)
q(xℓ)f(xℓ)
Importance
weights rℓ =p(xℓ)q(xℓ)
xℓ ∼ q(·)
![Page 71: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/71.jpg)
Importance sampling
un-normalized distributions
Un-normalized proposal distribution:
still easy to sample from
xℓ ∼ q(·)
p(x) = p(x)Zp
q(x) = q(x)Zq
Un-normalized distribution to sample from, still hard
to sample from and easy to evaluate
Ex [f(x)] ≈1
L
L∑
ℓ=1
p(xℓ)
q(xℓ)f(xℓ)
≈Zq
Zp
1
L
L∑
ℓ=1
p(xℓ)
q(xℓ)f(xℓ)
New term:
ratio of
normalizing
constants
![Page 72: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/72.jpg)
Normalizing the importance weights
Ex [f(x)] ≈Zq
Zp
1
L
L∑
ℓ=1
p(xℓ)
q(xℓ)f(xℓ)
≈
L∑
ℓ=1
wℓf(xℓ)
ZqZp≈ L∑
L
ℓ=1rℓ
Takes a little algebraUn-normalized importance weights
rℓ =p(xℓ)q(xℓ)
Normalized importance weights
wℓ =rℓ∑L
ℓ=1rℓ
![Page 73: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/73.jpg)
Linear State Space Model (LSSM)
• Discrete time
• First-order Markov chain
xt−1 xt xt+1
yt−1 yt yt+1
xt+1 = axt + ǫ, ǫ ∼ N(µǫ, σ2ǫ )
yt = bxt + η, η ∼ N(µη, σ2η)
![Page 74: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/74.jpg)
Inferring the distributions of interest
• Many methods exist to infer these distributions
– Markov Chain Monte Carlo (MCMC)
– Variational inference
– Belief propagation
– etc.
• In this setting sequential inference is possible
because of characteristics of the model structure
and preferable due to the problem requirements
![Page 75: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/75.jpg)
p(xi|y1:i) =
∫p(xi,x1:i−1|y1:i)dx1:i−1
∝
∫p(yi|xi,x1:i−1,y1:i−1)p(xi,x1:i−1,y1:i−1)dx1:i−1
∝ p(yi|xi)
∫p(xi|x1:i−1,y1:i)p(x1:i−1|y1:i)dx1:i−1
∝ p(yi|xi)
∫p(xi|xi−1)p(x1:i−1|y1:i−1)dx1:i−1
∝ p(yi|xi)
∫p(xi|xi−1)p(xi−1|y1:i−1)dxi−1
∝ p(yi|xi)p(xi|y1:i−1)
Exploiting LSSM model structure…
![Page 76: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/76.jpg)
Particle filtering
p(xi|y1:i) ∝ p(yi|xi)∫p(xi|xi−1)p(xi−1|y1:i−1)dxi−1
![Page 77: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/77.jpg)
p(xi|y1:i) =
∫p(xi,x1:i−1|y1:i)dx1:i−1
∝
∫p(yi|xi,x1:i−1,y1:i−1)p(xi,x1:i−1,y1:i−1)dx1:i−1
∝ p(yi|xi)
∫p(xi|x1:i−1,y1:i)p(x1:i−1|y1:i)dx1:i−1
∝ p(yi|xi)
∫p(xi|xi−1)p(x1:i−1|y1:i−1)dx1:i−1
∝ p(yi|xi)
∫p(xi|xi−1)p(xi−1|y1:i−1)dxi−1
∝ p(yi|xi)p(xi|y1:i−1)
Exploiting Markov structure…
xt−1 xt xt+1
yt−1 yt yt+1
Use Bayes’ rule and the conditional
independence structure dictated by the first
order Markov hidden variable model
![Page 78: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/78.jpg)
for sequential inference
p(xi|y1:i) =
∫p(xi,x1:i−1|y1:i)dx1:i−1
∝
∫p(yi|xi,x1:i−1,y1:i−1)p(xi,x1:i−1,y1:i−1)dx1:i−1
∝ p(yi|xi)
∫p(xi|x1:i−1,y1:i)p(x1:i−1|y1:i)dx1:i−1
∝ p(yi|xi)
∫p(xi|xi−1)p(x1:i−1|y1:i−1)dx1:i−1
∝ p(yi|xi)
∫p(xi|xi−1)p(xi−1|y1:i−1)dxi−1
∝ p(yi|xi)p(xi|y1:i−1)
![Page 79: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/79.jpg)
An alternative view
{wim, xim} ∼L∑
ℓ=1
wi−1ℓ p(xi|xi−1ℓ )
p(xi|y1:i−1) ≈∑M
m=1 wimδxim
p(xi|y1:i) ≈∑M
m=1 p(yi|xim)w
imδxim
![Page 80: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/80.jpg)
Sequential importance sampling
inference
• Start with a discrete representation of the posterior up to observation i-1
• Use Monte Carlo integration to represent the posterior predictive distribution as a finite mixture model
• Use importance sampling with the posterior predictive distribution as the proposal distribution to sample the posterior distribution up to observation i
![Page 81: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/81.jpg)
What?
p(xi|y1:i) ∝ p(yi|xi)
∫p(xi|xi−1)p(xi−1|y1:i−1)dxi−1
∝ p(yi|xi)p(xi|y1:i−1)
Posterior predictive distribution
State model
Likelihood
Start with a discrete
representation of this
distribution
![Page 82: Sequential Monte Carlo - Columbia Universityfwood/Tutorials/sequential_monte_carlo.pdf · hidden state (0,0) {wi−1 ... predictive distribution is the first place where we can introduce](https://reader033.vdocument.in/reader033/viewer/2022043020/5f3c4c9b034cee0d3414a6c4/html5/thumbnails/82.jpg)
Monte Carlo integration
limL→∞
L∑
ℓ=1
wℓf(xℓ) =
∫f(x)p(x)dxp(x) ≈
L∑
ℓ=1
wℓδxℓ
General setup
As applied in this stage of the particle filter
p(xi|y1:i−1) =
∫p(xi|xi−1)p(xi−1|y1:i−1)dxi−1
≈
L∑
ℓ=1
wi−1ℓ p(xi|xi−1ℓ )
Finite mixture model
Samples from