![Page 1: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/1.jpg)
Bayesian inference for doubly-intractabledistributions
Anne-Marie Lyne
April, 2016
![Page 2: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/2.jpg)
Joint Work
![Page 3: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/3.jpg)
On Russian Roulette Estimates for Bayesian Inference withDoubly-Intractable Likelihoods
Anne-Marie Lyne, Mark Girolami, Yves Atchade, HeikoStrathmann, Daniel Simpson
Statist. Sci. Volume 30, Number 4 (2015)
![Page 4: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/4.jpg)
Talk overview
1 Doubly intractable models
2 Current Bayesian approaches
3 Our approach using Russian roulette sampling
4 Results
![Page 5: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/5.jpg)
Motivation: Doubly-intractable models
![Page 6: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/6.jpg)
Motivation: Exponential Random Graph Models
Used extensively in the social networks community
View the observed network as one realisation of a randomvariableThe probability of observing a given graph is dependent oncertain ‘local’ graph propertiesFor example the edge density, the number of triangles or k-stars
![Page 7: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/7.jpg)
Motivation: Exponential Random Graph Models
Used extensively in the social networks communityView the observed network as one realisation of a randomvariable
The probability of observing a given graph is dependent oncertain ‘local’ graph propertiesFor example the edge density, the number of triangles or k-stars
![Page 8: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/8.jpg)
Motivation: Exponential Random Graph Models
Used extensively in the social networks communityView the observed network as one realisation of a randomvariableThe probability of observing a given graph is dependent oncertain ‘local’ graph properties
For example the edge density, the number of triangles or k-stars
![Page 9: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/9.jpg)
Motivation: Exponential Random Graph Models
Used extensively in the social networks communityView the observed network as one realisation of a randomvariableThe probability of observing a given graph is dependent oncertain ‘local’ graph propertiesFor example the edge density, the number of triangles or k-stars
![Page 10: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/10.jpg)
Motivation: Modelling social networks
P(Y = y) = Z(θ)−1 exp
(∑k
θkgk (y)
)
g(y) is a vector of K graph statisticsθ is a K -dimensional parameter indicating the ‘importance’ ofeach graph statistic
(Intractable) partition function or normalising term
Z(θ) =∑y∈Y
exp
(∑k
θkgk (y)
)
![Page 11: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/11.jpg)
Motivation: Modelling social networks
P(Y = y) = Z(θ)−1 exp
(∑k
θkgk (y)
)
g(y) is a vector of K graph statisticsθ is a K -dimensional parameter indicating the ‘importance’ ofeach graph statistic(Intractable) partition function or normalising term
Z(θ) =∑y∈Y
exp
(∑k
θkgk (y)
)
![Page 12: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/12.jpg)
Parameter inference for doubly-intractable models
Expectations with respect to the posterior distribution
Eπ[φ(θ)] =
∫Θφ(θ)π(θ|y)dθ ≈ 1
N
N∑k=1
φ(θk ) θk ∼ π(θ|y)
Simplest function of interest
Eπ[θ] =
∫Θθ π(θ|y)dθ ≈ 1
N
N∑k=1
θk θk ∼ π(θ|y)
But need to sample from the posterior distribution...
![Page 13: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/13.jpg)
The Metropolis-Hastings algorithm
To draw samples from a distribution π(θ):
Choose an initial θ0, define a proposal distribution q(θ, ·), set n = 0.
Iterate the following for n = 0 . . .Niters
1 Propose new parameter value, θ′, from q(θn, ·)2 Set θn+1 = θ′ with probability, α(θn, θ
′), else θn+1 = θn
α(θn, θ′) = min
[1,π(θ′)q(θ′, θn)
π(θn)q(θn, θ′)
]3 n = n + 1
![Page 14: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/14.jpg)
Doubly-intractable distributions
Unfortunately ERGMs are example of ‘doubly-intractable’distribution:
π(θ|y) = p(y|θ)π(θ)p(y)
=f (y;θ)Z(θ)
π(θ)/
p(y)
Partition function or normalising term, Z(θ), is intractable andfunction of parameters
α(θ,θ′) = min(
1,q(θ′,θ)π(θ′)f (y;θ′)Z(θ)q(θ,θ′)π(θ)f (y;θ)Z(θ′)
)
As well as ERGMs, lots of other examples... (Ising and Pottsmodels, spatial models, phylogenetic models)
![Page 15: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/15.jpg)
Doubly-intractable distributions
Unfortunately ERGMs are example of ‘doubly-intractable’distribution:
π(θ|y) = p(y|θ)π(θ)p(y)
=f (y;θ)Z(θ)
π(θ)/
p(y)
Partition function or normalising term, Z(θ), is intractable andfunction of parameters
α(θ,θ′) = min(
1,q(θ′,θ)π(θ′)f (y;θ′)Z(θ)q(θ,θ′)π(θ)f (y;θ)Z(θ′)
)
As well as ERGMs, lots of other examples... (Ising and Pottsmodels, spatial models, phylogenetic models)
![Page 16: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/16.jpg)
Doubly-intractable distributions
Unfortunately ERGMs are example of ‘doubly-intractable’distribution:
π(θ|y) = p(y|θ)π(θ)p(y)
=f (y;θ)Z(θ)
π(θ)/
p(y)
Partition function or normalising term, Z(θ), is intractable andfunction of parameters
α(θ,θ′) = min(
1,q(θ′,θ)π(θ′)f (y;θ′)Z(θ)q(θ,θ′)π(θ)f (y;θ)Z(θ′)
)
As well as ERGMs, lots of other examples... (Ising and Pottsmodels, spatial models, phylogenetic models)
![Page 17: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/17.jpg)
Current Bayesian approaches
Approaches which use some kind of approximation:pseudo-likelihoods
Exact-approximate MCMC approaches:auxiliary variable methods such as Exchange algorithm (requiresperfect sample if implemented correctly)
pseudo-marginal (requires unbiased estimate of likelihood)
α(θn, θ′) = min
[1,π(θ′)q(θ′, θn)
π(θn)q(θn, θ′)
]
![Page 18: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/18.jpg)
The Exchange algorithm (Murray et al 2004 andMøller et al 2004)
Expand the state space of our target (posterior) distribution to
p(x, θ, θ′|y) = f (y;θ)Z(θ)
π(θ)q(θ, θ′)f (x;θ′)Z(θ′)
/p(y)
Gibbs sample q(θ, θ′) f (x;θ′)Z(θ′)
Propose to swap θ ↔ θ′ using Metropolis-Hastings
α(θ, θ′) =f (y ; θ′)f (x ; θ)π(θ′)q(θ′, θ)f (y ; θ)f (x ; θ′)π(θ)q(θ, θ′)
![Page 19: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/19.jpg)
The Exchange algorithm (Murray et al 2004 andMøller et al 2004)
Expand the state space of our target (posterior) distribution to
p(x, θ, θ′|y) = f (y;θ)Z(θ)
π(θ)q(θ, θ′)f (x;θ′)Z(θ′)
/p(y)
Gibbs sample q(θ, θ′) f (x;θ′)Z(θ′)
Propose to swap θ ↔ θ′ using Metropolis-Hastings
α(θ, θ′) =f (y ; θ′)f (x ; θ)π(θ′)q(θ′, θ)f (y ; θ)f (x ; θ′)π(θ)q(θ, θ′)
![Page 20: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/20.jpg)
Pseudo-marginal MCMC (Roberts and Andrieu, 2009)Need an unbiased positive estimate of the target distributionp(y |θ,u) such that∫
p(y |θ,u)pθ(u)du = p(y |θ)
Define joint distribution
π(θ,u|y) = π(θ)p(y |θ,u)pθ(u)p(y)
This integrates to one, and has the posterior as its marginalWe can sample from this distribution!
α(θ, θ′) =p(y |θ′,u′)π(θ′)pθ′(u′)
p(y |θ,u)π(θ)pθ(u)× q(θ′, θ)pθ(u)
q(θ, θ′)pθ′(u′)
![Page 21: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/21.jpg)
Pseudo-marginal MCMC (Roberts and Andrieu, 2009)Need an unbiased positive estimate of the target distributionp(y |θ,u) such that∫
p(y |θ,u)pθ(u)du = p(y |θ)
Define joint distribution
π(θ,u|y) = π(θ)p(y |θ,u)pθ(u)p(y)
This integrates to one, and has the posterior as its marginalWe can sample from this distribution!
α(θ, θ′) =p(y |θ′,u′)π(θ′)pθ′(u′)
p(y |θ,u)π(θ)pθ(u)× q(θ′, θ)pθ(u)
q(θ, θ′)pθ′(u′)
![Page 22: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/22.jpg)
Pseudo-marginal MCMC (Roberts and Andrieu, 2009)Need an unbiased positive estimate of the target distributionp(y |θ,u) such that∫
p(y |θ,u)pθ(u)du = p(y |θ)
Define joint distribution
π(θ,u|y) = π(θ)p(y |θ,u)pθ(u)p(y)
This integrates to one, and has the posterior as its marginal
We can sample from this distribution!
α(θ, θ′) =p(y |θ′,u′)π(θ′)pθ′(u′)
p(y |θ,u)π(θ)pθ(u)× q(θ′, θ)pθ(u)
q(θ, θ′)pθ′(u′)
![Page 23: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/23.jpg)
Pseudo-marginal MCMC (Roberts and Andrieu, 2009)Need an unbiased positive estimate of the target distributionp(y |θ,u) such that∫
p(y |θ,u)pθ(u)du = p(y |θ)
Define joint distribution
π(θ,u|y) = π(θ)p(y |θ,u)pθ(u)p(y)
This integrates to one, and has the posterior as its marginalWe can sample from this distribution!
α(θ, θ′) =p(y |θ′,u′)π(θ′)pθ′(u′)
p(y |θ,u)π(θ)pθ(u)× q(θ′, θ)pθ(u)
q(θ, θ′)pθ′(u′)
![Page 24: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/24.jpg)
What’s the problem?
we need unbiased estimate of
π(θ|y) = p(y|θ)π(θ)p(y)
=f (y;θ)Z(θ)
π(θ)/
p(y)
We can unbiasedly estimate Z(θ) but if we take the reciprocalthen the estimate is no longer unbiased.
E[Z(θ)] = Z(θ)
E
[1Z(θ)
]6= 1Z(θ)
![Page 25: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/25.jpg)
What’s the problem?
we need unbiased estimate of
π(θ|y) = p(y|θ)π(θ)p(y)
=f (y;θ)Z(θ)
π(θ)/
p(y)
We can unbiasedly estimate Z(θ) but if we take the reciprocalthen the estimate is no longer unbiased.
E[Z(θ)] = Z(θ)
E
[1Z(θ)
]6= 1Z(θ)
![Page 26: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/26.jpg)
Our approach
Construct an unbiased estimate of the likelihood, based on aseries expansion of the likelihood and stochastic truncation.
Use pseudo-marginal MCMC to sample from the desiredposterior distribution.
![Page 27: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/27.jpg)
Proposed methodologyConstruct random variables {V j
θ, j ≥ 0} such that the series
π(θ|y , {V j}) =∞∑
j=0
V jθ has E
[π(θ|y , {V j})
]= π(θ|y).
This infinite series then needs to be truncated unbiasedly.This can be achieved via a number of Russian roulette schemes.Define random time, τθ, such that u := (τθ, {V j
θ,0 ≤ j ≤ τθ})
π(θ,u|y) =τθ∑
j=0
V jθ which satisfies
E[π(θ,u|y)|{V j
θ, j ≥ 0}]=∞∑
j=0
V jθ
![Page 28: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/28.jpg)
Proposed methodologyConstruct random variables {V j
θ, j ≥ 0} such that the series
π(θ|y , {V j}) =∞∑
j=0
V jθ has E
[π(θ|y , {V j})
]= π(θ|y).
This infinite series then needs to be truncated unbiasedly.This can be achieved via a number of Russian roulette schemes.
Define random time, τθ, such that u := (τθ, {V jθ,0 ≤ j ≤ τθ})
π(θ,u|y) =τθ∑
j=0
V jθ which satisfies
E[π(θ,u|y)|{V j
θ, j ≥ 0}]=∞∑
j=0
V jθ
![Page 29: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/29.jpg)
Proposed methodologyConstruct random variables {V j
θ, j ≥ 0} such that the series
π(θ|y , {V j}) =∞∑
j=0
V jθ has E
[π(θ|y , {V j})
]= π(θ|y).
This infinite series then needs to be truncated unbiasedly.This can be achieved via a number of Russian roulette schemes.Define random time, τθ, such that u := (τθ, {V j
θ,0 ≤ j ≤ τθ})
π(θ,u|y) =τθ∑
j=0
V jθ which satisfies
E[π(θ,u|y)|{V j
θ, j ≥ 0}]=∞∑
j=0
V jθ
![Page 30: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/30.jpg)
Implementation example
Rewrite the likelihood as an infinite series. Inspired by the workof Booth (2005).
Simple manipulation:
f (y;θ)Z(θ)
=f (y;θ)Z(θ)
1
1−[1− Z(θ)
Z(θ)
] =f (y;θ)Z(θ)
∞∑n=0
κ(θ)n
whereκ(θ) = 1− Z(θ)
Z(θ)
The series converges for |κ(θ)| < 1.
![Page 31: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/31.jpg)
Implementation example
Rewrite the likelihood as an infinite series. Inspired by the workof Booth (2005).Simple manipulation:
f (y;θ)Z(θ)
=f (y;θ)Z(θ)
1
1−[1− Z(θ)
Z(θ)
] =f (y;θ)Z(θ)
∞∑n=0
κ(θ)n
whereκ(θ) = 1− Z(θ)
Z(θ)
The series converges for |κ(θ)| < 1.
![Page 32: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/32.jpg)
Implementation example
Rewrite the likelihood as an infinite series. Inspired by the workof Booth (2005).Simple manipulation:
f (y;θ)Z(θ)
=f (y;θ)Z(θ)
1
1−[1− Z(θ)
Z(θ)
] =f (y;θ)Z(θ)
∞∑n=0
κ(θ)n
whereκ(θ) = 1− Z(θ)
Z(θ)
The series converges for |κ(θ)| < 1.
![Page 33: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/33.jpg)
Implementation example continued
We can unbiasedly estimate each term in the series using nindependent estimates of Z(θ).
f (y;θ)Z(θ)
=f (y;θ)Z(θ)
∞∑n=0
[1− Z(θ)Z(θ)
]n
≈ f (y;θ)Z(θ)
∞∑n=0
n∏i=1
[1− Zi(θ)
Z(θ)
]
Computed using importance sampling (IS) or sequential MonteCarlo (SMC), for example.
But can’t compute an infinite number of them...
![Page 34: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/34.jpg)
Implementation example continued
We can unbiasedly estimate each term in the series using nindependent estimates of Z(θ).
f (y;θ)Z(θ)
=f (y;θ)Z(θ)
∞∑n=0
[1− Z(θ)Z(θ)
]n
≈ f (y;θ)Z(θ)
∞∑n=0
n∏i=1
[1− Zi(θ)
Z(θ)
]
Computed using importance sampling (IS) or sequential MonteCarlo (SMC), for example.
But can’t compute an infinite number of them...
![Page 35: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/35.jpg)
Implementation example continued
We can unbiasedly estimate each term in the series using nindependent estimates of Z(θ).
f (y;θ)Z(θ)
=f (y;θ)Z(θ)
∞∑n=0
[1− Z(θ)Z(θ)
]n
≈ f (y;θ)Z(θ)
∞∑n=0
n∏i=1
[1− Zi(θ)
Z(θ)
]
Computed using importance sampling (IS) or sequential MonteCarlo (SMC), for example.
But can’t compute an infinite number of them...
![Page 36: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/36.jpg)
Unbiased estimates of infinite series
Take an infinite convergent series S =∑∞
k=0 ak which we wouldlike to estimate unbiasedly.
The simplest: draw integer k with probability p(K = k) where∑∞k=0 p(K = k) = 1, then S = ak/p(K = k)
E[S] =∑
kp(K =k)akp(K =k) = S
(This is essentially importance sampling)
Variance:∑∞
n=0
[a2
np(N=n)
]− S2
![Page 37: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/37.jpg)
Unbiased estimates of infinite series
Take an infinite convergent series S =∑∞
k=0 ak which we wouldlike to estimate unbiasedly.
The simplest: draw integer k with probability p(K = k) where∑∞k=0 p(K = k) = 1, then S = ak/p(K = k)
E[S] =∑
kp(K =k)akp(K =k) = S
(This is essentially importance sampling)
Variance:∑∞
n=0
[a2
np(N=n)
]− S2
![Page 38: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/38.jpg)
Unbiased estimates of infinite series
Take an infinite convergent series S =∑∞
k=0 ak which we wouldlike to estimate unbiasedly.
The simplest: draw integer k with probability p(K = k) where∑∞k=0 p(K = k) = 1, then S = ak/p(K = k)
E[S] =∑
kp(K =k)akp(K =k) = S
(This is essentially importance sampling)
Variance:∑∞
n=0
[a2
np(N=n)
]− S2
![Page 39: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/39.jpg)
Unbiased estimates of infinite series
Take an infinite convergent series S =∑∞
k=0 ak which we wouldlike to estimate unbiasedly.
The simplest: draw integer k with probability p(K = k) where∑∞k=0 p(K = k) = 1, then S = ak/p(K = k)
E[S] =∑
kp(K =k)akp(K =k) = S
(This is essentially importance sampling)
Variance:∑∞
n=0
[a2
np(N=n)
]− S2
![Page 40: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/40.jpg)
Russian roulette
Alternative: Russian roulette.
Choose series of probabilities, {qn}, and draw sequence of i.i.d.uniform random variables, {Un} ∼ U [0,1], n = 1,2,3 . . .
Define time τ = inf{k ≥ 1 : Uk ≥ qk}
Sτ =∑τ−1
j=0aj∏j
i=1 qi, is an unbiased estimate of S.
Must choose {qn} to minimise variance of estimator.
![Page 41: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/41.jpg)
Russian roulette
Alternative: Russian roulette.
Choose series of probabilities, {qn}, and draw sequence of i.i.d.uniform random variables, {Un} ∼ U [0,1], n = 1,2,3 . . .Define time τ = inf{k ≥ 1 : Uk ≥ qk}
Sτ =∑τ−1
j=0aj∏j
i=1 qi, is an unbiased estimate of S.
Must choose {qn} to minimise variance of estimator.
![Page 42: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/42.jpg)
Russian roulette
Alternative: Russian roulette.
Choose series of probabilities, {qn}, and draw sequence of i.i.d.uniform random variables, {Un} ∼ U [0,1], n = 1,2,3 . . .Define time τ = inf{k ≥ 1 : Uk ≥ qk}
Sτ =∑τ−1
j=0aj∏j
i=1 qi, is an unbiased estimate of S.
Must choose {qn} to minimise variance of estimator.
![Page 43: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/43.jpg)
Russian roulette
Alternative: Russian roulette.
Choose series of probabilities, {qn}, and draw sequence of i.i.d.uniform random variables, {Un} ∼ U [0,1], n = 1,2,3 . . .Define time τ = inf{k ≥ 1 : Uk ≥ qk}
Sτ =∑τ−1
j=0aj∏j
i=1 qi, is an unbiased estimate of S.
Must choose {qn} to minimise variance of estimator.
![Page 44: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/44.jpg)
Debiasing estimator, McLeish (2011), Rhee Glynn(2012)
Want unbiased estimate of E[Y ], but can’t generate Y . Cangenerate approximations, Yn, s.t. limn→∞ E[Yn]→ E[Y ].
E[Y ] = Y0 +∞∑
i=1
Yi − Yi−1
Define probability distribution for N, non-negative, integer-valuedrandom variable, then
Z = Y0 +N∑
i=1
Yi − Yi−1
P(N ≥ i)
is an unbiased estimator of E[Y ] and has finite variance.
![Page 45: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/45.jpg)
Debiasing estimator, McLeish (2011), Rhee Glynn(2012)
Want unbiased estimate of E[Y ], but can’t generate Y . Cangenerate approximations, Yn, s.t. limn→∞ E[Yn]→ E[Y ].
E[Y ] = Y0 +∞∑
i=1
Yi − Yi−1
Define probability distribution for N, non-negative, integer-valuedrandom variable, then
Z = Y0 +N∑
i=1
Yi − Yi−1
P(N ≥ i)
is an unbiased estimator of E[Y ] and has finite variance.
![Page 46: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/46.jpg)
Debiasing estimator, McLeish (2011), Rhee Glynn(2012)
Want unbiased estimate of E[Y ], but can’t generate Y . Cangenerate approximations, Yn, s.t. limn→∞ E[Yn]→ E[Y ].
E[Y ] = Y0 +∞∑
i=1
Yi − Yi−1
Define probability distribution for N, non-negative, integer-valuedrandom variable, then
Z = Y0 +N∑
i=1
Yi − Yi−1
P(N ≥ i)
is an unbiased estimator of E[Y ] and has finite variance.
![Page 47: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/47.jpg)
Debiasing estimator, McLeish (2011), Rhee Glynn(2012)
Want unbiased estimate of E[Y ], but can’t generate Y . Cangenerate approximations, Yn, s.t. limn→∞ E[Yn]→ E[Y ].
E[Y ] = Y0 +∞∑
i=1
Yi − Yi−1
Define probability distribution for N, non-negative, integer-valuedrandom variable, then
Z = Y0 +N∑
i=1
Yi − Yi−1
P(N ≥ i)
is an unbiased estimator of E[Y ] and has finite variance.
![Page 48: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/48.jpg)
Negative estimates
Often we cannot guarantee the overall estimate will be positive
Can use a trick from the Physics literature...Recall we have an unbiased estimate of the likelihood, p(y |θ,u)
Eπ[φ(θ)] =
∫φ(θ)π(θ|y)dθ =
∫ ∫φ(θ)π(θ,u|y) dθdu
=
∫ ∫φ(θ) p(y |θ,u)π(θ)pθ(u) dθdu∫ ∫
p(y |θ,u)π(θ)pθ(u) dθdu
=
∫ ∫φ(θ)σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu∫ ∫σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu
![Page 49: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/49.jpg)
Negative estimates
Often we cannot guarantee the overall estimate will be positive
Can use a trick from the Physics literature...Recall we have an unbiased estimate of the likelihood, p(y |θ,u)
Eπ[φ(θ)] =
∫φ(θ)π(θ|y)dθ =
∫ ∫φ(θ)π(θ,u|y) dθdu
=
∫ ∫φ(θ) p(y |θ,u)π(θ)pθ(u) dθdu∫ ∫
p(y |θ,u)π(θ)pθ(u) dθdu
=
∫ ∫φ(θ)σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu∫ ∫σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu
![Page 50: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/50.jpg)
Negative estimates
Often we cannot guarantee the overall estimate will be positive
Can use a trick from the Physics literature...Recall we have an unbiased estimate of the likelihood, p(y |θ,u)
Eπ[φ(θ)] =
∫φ(θ)π(θ|y)dθ =
∫ ∫φ(θ)π(θ,u|y) dθdu
=
∫ ∫φ(θ) p(y |θ,u)π(θ)pθ(u) dθdu∫ ∫
p(y |θ,u)π(θ)pθ(u) dθdu
=
∫ ∫φ(θ)σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu∫ ∫σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu
![Page 51: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/51.jpg)
Negative estimates cont.From last slide
Eπ[φ(θ)] =
∫ ∫φ(θ)σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu∫ ∫σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu
=
∫ ∫φ(θ)σ(p) q(θ,u|y) dθdu∫ ∫σ(p) q(θ,u|y) dθdu
Can get a Monte Carlo estimate of φ(θ) wrt the posterior usingsamples from the ‘absolute’ distribution
Eπ[φ(θ)] ≈∑
k φ(θk )σ(pk )∑k σ(pk )
![Page 52: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/52.jpg)
Negative estimates cont.From last slide
Eπ[φ(θ)] =
∫ ∫φ(θ)σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu∫ ∫σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu
=
∫ ∫φ(θ)σ(p) q(θ,u|y) dθdu∫ ∫σ(p) q(θ,u|y) dθdu
Can get a Monte Carlo estimate of φ(θ) wrt the posterior usingsamples from the ‘absolute’ distribution
Eπ[φ(θ)] ≈∑
k φ(θk )σ(pk )∑k σ(pk )
![Page 53: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/53.jpg)
Summary
So, we can get an unbiased estimate of doubly-intractabledistribution.
Draw a random integer, k ∼ p(k)Compute k -th term in infinite seriesCompute overall unbiased estimate of likelihood
Plug the absolute value into a pseudo-marginal MCMC schemeto sample from the posterior (or close to...).Compute expectations with respect to the posterior usingimportance sampling identity.
However, the methodology is computationally costly as needmany low-variance estimates of partition function.
But, we can compute estimates in parallel...
![Page 54: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/54.jpg)
Summary
So, we can get an unbiased estimate of doubly-intractabledistribution.
Draw a random integer, k ∼ p(k)Compute k -th term in infinite seriesCompute overall unbiased estimate of likelihood
Plug the absolute value into a pseudo-marginal MCMC schemeto sample from the posterior (or close to...).Compute expectations with respect to the posterior usingimportance sampling identity.
However, the methodology is computationally costly as needmany low-variance estimates of partition function.
But, we can compute estimates in parallel...
![Page 55: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/55.jpg)
Summary
So, we can get an unbiased estimate of doubly-intractabledistribution.
Draw a random integer, k ∼ p(k)Compute k -th term in infinite seriesCompute overall unbiased estimate of likelihood
Plug the absolute value into a pseudo-marginal MCMC schemeto sample from the posterior (or close to...).Compute expectations with respect to the posterior usingimportance sampling identity.
However, the methodology is computationally costly as needmany low-variance estimates of partition function.
But, we can compute estimates in parallel...
![Page 56: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/56.jpg)
Summary
So, we can get an unbiased estimate of doubly-intractabledistribution.
Draw a random integer, k ∼ p(k)Compute k -th term in infinite seriesCompute overall unbiased estimate of likelihood
Plug the absolute value into a pseudo-marginal MCMC schemeto sample from the posterior (or close to...).Compute expectations with respect to the posterior usingimportance sampling identity.
However, the methodology is computationally costly as needmany low-variance estimates of partition function.
But, we can compute estimates in parallel...
![Page 57: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/57.jpg)
Summary
So, we can get an unbiased estimate of doubly-intractabledistribution.
Draw a random integer, k ∼ p(k)Compute k -th term in infinite seriesCompute overall unbiased estimate of likelihood
Plug the absolute value into a pseudo-marginal MCMC schemeto sample from the posterior (or close to...).Compute expectations with respect to the posterior usingimportance sampling identity.
However, the methodology is computationally costly as needmany low-variance estimates of partition function.
But, we can compute estimates in parallel...
![Page 58: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/58.jpg)
Results: Ising models
We simulated a 40x40 grid of data points from a Gibbs sampler withJβ = 0.2
![Page 59: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/59.jpg)
Geometric construction with Russian roulette samplingAISParallel implementation using Matlab.
![Page 60: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/60.jpg)
Example: Florentine business network
ERGM model, 16 nodes
Graph statistics in model exponent are number of edges,number of 2- and 3-stars and number of trianglesEstimates of the normalising term were computed using SMCSeries truncation was carried out using Russian roulette
![Page 61: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/61.jpg)
Example: Florentine business network
ERGM model, 16 nodesGraph statistics in model exponent are number of edges,number of 2- and 3-stars and number of triangles
Estimates of the normalising term were computed using SMCSeries truncation was carried out using Russian roulette
![Page 62: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/62.jpg)
Example: Florentine business network
ERGM model, 16 nodesGraph statistics in model exponent are number of edges,number of 2- and 3-stars and number of trianglesEstimates of the normalising term were computed using SMC
Series truncation was carried out using Russian roulette
![Page 63: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/63.jpg)
Example: Florentine business network
ERGM model, 16 nodesGraph statistics in model exponent are number of edges,number of 2- and 3-stars and number of trianglesEstimates of the normalising term were computed using SMCSeries truncation was carried out using Russian roulette
![Page 64: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/64.jpg)
Example: Florentine business network
![Page 65: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/65.jpg)
Example: Florentine business network
![Page 66: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/66.jpg)
Example: Florentine business network
![Page 67: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/67.jpg)
Further work
Optimise various parts of the methodology, stoppingprobabilities etc.Compare with approximate approaches in terms of variance andcomputation
Thank you for listening!
![Page 68: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/68.jpg)
Further work
Optimise various parts of the methodology, stoppingprobabilities etc.Compare with approximate approaches in terms of variance andcomputation
Thank you for listening!
![Page 69: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016](https://reader035.vdocument.in/reader035/viewer/2022062306/5f0679b37e708231d4182dd3/html5/thumbnails/69.jpg)
References
MCMC for doubly-intractable distributions. Murray, Ghahramani,MacKay (2004)An efficient Markov chain Monte Carlo method for distributionswith intractable normalising constants. Møller, Pettitt,Berthelsen, Reeves (2004).Unbiased Estimation with Square Root Convergence for SDEModels. Rhee, Glynn (2012)A general method for debiasing a Monte Carlo estimator.McLeish (2011)Unbiased Monte Carlo Estimation of the Reciprocal of anIntegral. Booth (2007)