an introduction to nested samplinghea-2017/09/26 · an introduction to (dynamic) nested sampling...
TRANSCRIPT
An Introduction to (Dynamic) Nested Sampling
Josh Speagle
Introduction
Background
Pr 𝚯 𝐃, M =Pr 𝐃 𝚯, M Pr 𝚯|M
Pr 𝐃 M
Bayes’ Theorem
Background
Pr 𝚯 𝐃, M =Pr 𝐃 𝚯, M Pr 𝚯|M
Pr 𝐃 M
Bayes’ Theorem
PriorLikelihoodPosterior
Evidence
Background
𝑝 𝚯 =ℒ 𝚯 𝜋 𝚯
𝒵
Bayes’ Theorem
≡ Ω𝚯
ℒ 𝚯 𝜋 𝚯 𝑑𝚯
PosteriorLikelihood Prior
Evidence
Background
𝑝 𝚯 =ℒ 𝚯 𝜋 𝚯
𝒵
Bayes’ Theorem
≡ Ω𝚯
ℒ 𝚯 𝜋 𝚯 𝑑𝚯
PosteriorLikelihood Prior
Evidence
Posterior Estimation via Sampling
𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯Samples
Posterior Estimation via Sampling
𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯
𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯
=
𝑖=1
𝑁
𝑝𝑖𝛿𝐷 𝚯 − 𝚯i
Samples
Weights
Posterior Estimation via Sampling
𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯
𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯
=
𝑖=1
𝑁
𝑝𝑖𝛿𝐷 𝚯 − 𝚯i
Samples
Weights
1, 1, … , 1, 1MCMC
Posterior Estimation via Sampling
𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯
𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯
=
𝑖=1
𝑁
𝑝𝑖𝛿𝐷 𝚯 − 𝚯i
Samples
Weights
1, 1, … , 1, 1MCMC𝑝𝑁𝑞𝑁
, 𝑝𝑁−1𝑞𝑁−1
, … , 𝑝2𝑞2
, 𝑝1𝑞1
Importance
Sampling
Posterior Estimation via Sampling
𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯
𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯
=
𝑖=1
𝑁
𝑝𝑖𝛿𝐷 𝚯 − 𝚯i
Samples
Weights
Nested Sampling?
Motivation: Integrating the Posterior
Ω𝚯
𝑝 𝚯 𝑑𝚯
Motivation: Integrating the Posterior
{𝚯: 𝑝 𝚯 =𝜆}
𝜆𝑑𝑉 𝜆
𝑝 𝑉𝑖 = 𝜆𝑖
𝑑𝑉𝑖
Motivation: Integrating the Posterior
0
∞
𝑝 𝑉 𝑑𝑉 𝜆
𝑝 𝑉𝑖 = 𝜆𝑖
𝑑𝑉𝑖
Motivation: Integrating the Posterior
0
∞
𝑝 𝑉 𝑑𝑉 𝜆“Amplitude”
“Volume”
𝑝 𝑉𝑖 = 𝜆𝑖
𝑑𝑉𝑖
Motivation: Integrating the Posterior
0
∞
𝑝 𝑉 𝑑𝑉 𝜆
𝑝 𝑉𝑖 = 𝜆𝑖
𝑑𝑉𝑖
“Typical Set”
Motivation: Integrating the Posterior
0
∞
𝑝 𝑉 𝑑𝑉 𝜆
𝑑𝑉 ∝ 𝑟𝐷−1
“Typical Set”
Motivation: Integrating the Posterior
Ω𝚯
𝑝 𝚯 𝑑𝚯
Motivation: Integrating the Posterior
𝒵 ≡ Ω𝚯
ℒ 𝚯 𝜋 𝚯 𝑑𝚯
Motivation: Integrating the Posterior
𝒵 ≡ Ω𝚯
ℒ 𝚯 𝜋 𝚯 𝑑𝚯
𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆
𝜋 𝚯 𝑑𝚯
“Prior Volume”
Feroz et al. (2013)
Motivation: Integrating the Posterior
𝒵 = 0
∞
𝑋 𝜆 𝑑𝜆
𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆
𝜋 𝚯 𝑑𝚯
“Prior Volume”
Feroz et al. (2013)
Motivation: Integrating the Posterior
𝒵 = 0
1
ℒ 𝑋 𝑑𝑋
𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆
𝜋 𝚯 𝑑𝚯
“Prior Volume”
Feroz et al. (2013)
Motivation: Integrating the Posterior
𝒵 ≈
𝑖=1
𝑛
𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …
𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆
𝜋 𝚯 𝑑𝚯
“Prior Volume”
Feroz et al. (2013)
𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , … can be
rectangles, trapezoids, etc.
Amplitude
Differential
Volume Δ𝑋𝑖
Motivation: Integrating the Posterior
𝒵 ≈
𝑖=1
𝑛
𝑤𝑖
𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆
𝜋 𝚯 𝑑𝚯
“Prior Volume”
Feroz et al. (2013)
𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …
Motivation: Integrating the Posterior
𝑝𝑖 = 𝑤𝑖
𝒵
We get posteriors “for free”
𝒵 ≈
𝑖=1
𝑛
𝑤𝑖
Importance Weight
𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …
Motivation: Integrating the Posterior
𝒵 ≈
𝑖=1
𝑛
𝑤𝑖
𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …
Directly proportional to
typical set. 𝑝𝑖 =
𝑤𝑖
𝒵
We get posteriors “for free”
Importance Weight
~ 𝑓 𝑝𝑖 , … 𝑔 𝑉𝑖 , …
Motivation: Sampling the Posterior
Sampling directly from the
likelihood ℒ 𝚯 is hard.
Pictures from this 2010 talk by Skilling.
Motivation: Sampling the Posterior
Sampling uniformly within
bound ℒ 𝚯 > 𝜆 is easier.
Pictures from this 2010 talk by Skilling.
Motivation: Sampling the Posterior
Sampling uniformly within
bound ℒ 𝚯 > 𝜆 is easier.
Pictures from this 2010 talk by Skilling.
𝑋𝑖−1
Motivation: Sampling the Posterior
Sampling uniformly within
bound ℒ 𝚯 > 𝜆 is easier.
Pictures from this 2010 talk by Skilling.
𝑋𝑖−1
Motivation: Sampling the Posterior
Sampling uniformly within
bound ℒ 𝚯 > 𝜆 is easier.
Pictures from this 2010 talk by Skilling.
𝑋𝑖𝑋𝑖−1
Motivation: Sampling the Posterior
Sampling uniformly within
bound ℒ 𝚯 > 𝜆 is easier.
Pictures from this 2010 talk by Skilling.
𝑋𝑖+1𝑋𝑖
Motivation: Sampling the Posterior
Sampling uniformly within
bound ℒ 𝚯 > 𝜆 is easier.
Pictures from this 2010 talk by Skilling.
𝑋𝑖+1𝑋𝑖
MCMC: Solving a Hard Problem once.
vs
Nested Sampling: Solving an Easier
Problem many times.
How Nested Sampling Works
Estimating the Prior Volume
𝒵 ≈
𝑖=1
𝑛
𝑤𝑖
𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆
𝜋 𝚯 𝑑𝚯
“Prior Volume”
Feroz et al. (2013)
𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …
Estimating the Prior Volume
𝒵 ≈
𝑖=1
𝑛
𝑤𝑖
𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆
𝜋 𝚯 𝑑𝚯
“Prior Volume”
Feroz et al. (2013)
𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …
Estimating the Prior Volume
𝒵 ≈
𝑖=1
𝑛
𝑤𝑖
𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆
𝜋 𝚯 𝑑𝚯
“Prior Volume”
Feroz et al. (2013)
𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …
???
Estimating the Prior Volume
𝒵 ≈
𝑖=1
𝑛
𝑤𝑖
𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆
𝜋 𝚯 𝑑𝚯
“Prior Volume”
Feroz et al. (2013)
𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …
???
𝑋~𝑓 𝑥𝐹 𝑋 ~ Unif
CDF
Probability Integral
Transform
Estimating the Prior Volume
𝒵 ≈
𝑖=1
𝑛
𝑤𝑖
𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆
𝜋 𝚯 𝑑𝚯
“Prior Volume”
Feroz et al. (2013)
𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …
???
𝑋~𝑓 𝑥𝐹 𝑋 ~ Unif
CDF
Probability Integral
Transform Need to sample from the constrained prior.
Estimating the Prior Volume
Pictures from this 2010 talk by Skilling.
𝑋𝑖+1𝑋𝑖
Posterior
Estimating the Prior Volume
Pictures from this 2010 talk by Skilling.
𝑋𝑖+1𝑋𝑖
𝑋𝑖+1 = 𝑈𝑖𝑋𝑖
𝑈𝑖 ~ Unif
Posterior
Pictures from this 2010 talk by Skilling.
𝑋𝑖+1𝑋𝑖
𝑋𝑖+1 =
𝑗=0
𝑖
𝑈𝑗 𝑋0
𝑈0, … , 𝑈𝑖~ Unifi. i. d.
Estimating the Prior Volume
Posterior
Pictures from this 2010 talk by Skilling.
𝑋𝑖+1𝑋𝑖
𝑋𝑖+1 =
𝑗=0
𝑖
𝑈𝑗 𝑋0
𝑈0, … , 𝑈𝑖~ Unif
𝑋0 ≡ 1
i. i. d.
Estimating the Prior Volume
Posterior
Pictures from this 2010 talk by Skilling.
𝑋𝑖+1𝑋𝑖
𝑋𝑖+1 =
𝑗=0
𝑖
𝑈𝑗
𝑈0, … , 𝑈𝑖~ Unifi. i. d.
𝑋0 ≡ 1
Estimating the Prior Volume
Posterior
Estimating the Prior Volume
𝒵 ≈
𝑖=1
𝑛
𝑤𝑖
𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆
𝜋 𝚯 𝑑𝚯
“Prior Volume”
Feroz et al. (2013)
𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …
???
Estimating the Prior Volume
𝒵 ≈
𝑖=1
𝑛
𝑤𝑖
𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆
𝜋 𝚯 𝑑𝚯
“Prior Volume”
Feroz et al. (2013)
𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …
Pr 𝑋1, 𝑋2, … , 𝑋𝑖where 1st & 2nd moments
can be computed
Nested Sampling Algorithm
ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0
ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0𝚯1𝚯2𝚯𝑁−1𝚯𝑁
Nested Sampling Algorithm (Ideal)
ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0
Samples sequentially drawn
from constrained prior 𝜋ℒ>ℒ𝑖𝚯 .
ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0𝚯1𝚯2𝚯𝑁−1𝚯𝑁
Nested Sampling Algorithm (Naïve)
ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0
𝚯1 ~ 𝜋 𝚯
1. Samples sequentially drawn from prior 𝜋 𝚯 .
2. New point only accepted if ℒ𝑖+1 > ℒ𝑖.
Nested Sampling Algorithm (Naïve)
ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0
𝚯1 ~ 𝜋 𝚯𝚯2
1. Samples sequentially drawn from prior 𝜋 𝚯 .
2. New point only accepted if ℒ𝑖+1 > ℒ𝑖.
ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0
Nested Sampling Algorithm (Naïve)
ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0
𝚯1 ~ 𝜋 𝚯𝚯2𝚯𝑁−1𝚯𝑁
1. Samples sequentially drawn from prior 𝜋 𝚯 .
2. New point only accepted if ℒ𝑖+1 > ℒ𝑖.
Building a NestedSampling Algorithm
Components of the Algorithm
1. Adding more particles.
2. Knowing when to stop.
3. What to do after stopping.
Adding More Particles
ℒ𝑁1
1> ⋯ > ℒ2
1> ℒ1
1> 0
Skilling (2006)
Adding More Particles
ℒ𝑁1
1> ⋯ > ℒ2
1> ℒ1
1> 0
Skilling (2006)
𝑋𝑖+11
=
𝑗=0
𝑖
𝑈𝑗
Adding More Particles
ℒ𝑁1
1> ⋯ > ℒ2
1> ℒ1
1> 0
ℒ𝑁2
2> ⋯ > ℒ2
2> ℒ1
2> 0
Skilling (2006)
Adding More Particles
ℒ𝑁1
1> ⋯ > ℒ2
1> ℒ1
1> 0
ℒ𝑁2
2> ⋯ > ℒ2
2> ℒ1
2> 0
Skilling (2006)
𝑋𝑖+11
=
𝑗=0
𝑖
𝑈𝑗 𝑋𝑗+12
=
𝑘=0
𝑗
𝑈𝑘
Adding More Particles
ℒ𝑁1
1> ℒ𝑁2
2> ⋯ > ℒ2
1
> ℒ22
> ℒ12
> ℒ11
> 0
Skilling (2006)
Adding More Particles
ℒ𝑁1
1> ℒ𝑁2
2> ⋯ > ℒ2
1
> ℒ22
> ℒ12
> ℒ11
> 0
Skilling (2006)
Adding More Particles
ℒ𝑁1
1> ℒ𝑁2
2> ⋯ > ℒ2
1
> ℒ22
> ℒ12
> ℒ11
> 0
Skilling (2006)
Adding More Particles
ℒ𝑁1
1> ℒ𝑁2
2> ⋯ > ℒ2
1
> ℒ22
> ℒ12
> ℒ11
> 0
Skilling (2006)
Adding More Particles
ℒ𝑁1
1> ℒ𝑁2
2> ⋯ > ℒ2
1
> ℒ22
> ℒ12
> ℒ11
> 0
Skilling (2006)
Adding More Particles
ln 𝑋𝑖+1 =
𝑗=0
𝑖
ln 𝑈𝑗
𝑈0, … , 𝑈𝑖~ Unif
Skilling (2006)
Adding More Particles
ln 𝑋𝑖+1 =
𝑗=0
𝑖
ln 𝑇𝑗
𝑇0, … , 𝑇𝑖~ 𝑈 2
𝑈1, 𝑈2 ~ Unif ⇒ 𝑈 1 , 𝑈 2
Order Statistics
Skilling (2006)
Adding More Particles
ln 𝑋𝑖+1 =
𝑗=0
𝑖
ln 𝑇𝑗
𝑇0, … , 𝑇𝑖~ Beta 2,1
Skilling (2006)
𝑈1, 𝑈2 ~ Unif ⇒ 𝑈 1 , 𝑈 2
Order Statistics
Adding More Particles
ℒ𝑁1
1> ℒ𝑁2
2> ⋯ > ℒ2
1
> ℒ22
> ℒ12
> ℒ11
> 0
Skilling (2006)
Adding More Particles
ℒ𝑁1
1> ℒ𝑁2
2> ⋯ > ℒ2
1
> ℒ22
> ℒ12
> ℒ11
> 0
One run with 2 “live points”
= 2 runs with 1 live point
Skilling (2006)
live points
dead points
Adding More Particles
ℒ𝑁1
1> ℒ𝑁2
2> ⋯ > ℒ2
1
> ℒ22
> ℒ12
> ℒ11
> 0
One run with K “live points”
= K runs with 1 live point
Skilling (2006)
live points
dead points
Adding More Particles
ln 𝔼 𝑋𝑖+1 =
𝑗=0
𝑖
ln 𝔼 𝑇𝑗
𝑇0, … , 𝑇𝑖~ Beta 2,1
Skilling (2006)
𝑇1, 𝑇2 ~ Unif ⇒ 𝑈 1 , 𝑈 2
Order Statisticslive points
dead points
Adding More Particles
𝑇0, … , 𝑇𝑖~ Beta K, 1
Skilling (2006)
𝑈1, … , 𝑈𝐾 ~ Unif ⇒ 𝑈 1 , … , 𝑈 𝐾
Order Statistics
ln 𝔼 𝑋𝑖+1 =
𝑗=0
𝑖
ln 𝔼 𝑇𝑗
Adding More Particles
𝑇0, … , 𝑇𝑖~ Beta K, 1
Skilling (2006)
ln 𝔼 𝑋𝑖+1 =
𝑗=0
𝑖
ln𝐾
𝐾 + 1
𝑈1, … , 𝑈𝐾 ~ Unif ⇒ 𝑈 1 , … , 𝑈 𝐾
Order Statistics
Stopping Criteria
𝒵 = 𝒵𝑖 + 𝒵in
Stopping Criteria
𝒵𝑖
𝒵𝑖 + 𝒵in
> 𝑓
Stopping Criteria
𝒵 =
𝑗=1
𝑖
𝑤𝑗 + 𝒵in
Stopping Criteria
𝒵 ≤
𝑗=1
𝑖
𝑤𝑗 + ℒ K 𝑋𝑛
Uniform slab
Maximum
Likelihood
Stopping Criteria
𝒵 ≲
𝑗=1
𝑖
𝑤𝑗 + ℒ K 𝑋𝑛
Uniform slab
Maximum
Likelihood
“Recycling” the Final Set of Particles
“Recycling” the Final Set of Particles
“Recycling” the Final Set of Particles
… , 𝜒𝑁+1𝑘
, … ~ 𝑈𝑘𝑋𝑁
𝑈1, … , 𝑈𝐾 ∼ Unif
“Recycling” the Final Set of Particles
… , 𝜒𝑁+1𝑘
, … ~ 𝑈𝑘𝑋𝑁
𝑈1, … , 𝑈𝐾 ∼ Unif
ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0
“Recycling” the Final Set of Particles
… , 𝜒𝑁+1𝑘
, … ~ 𝑈𝑘𝑋𝑁
𝑈1, … , 𝑈𝐾 ∼ Unif
ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0
“Recycling” the Final Set of Particles
… , 𝜒𝑁+1𝑘
, … ~ 𝑈𝑘𝑋𝑁
𝑈1, … , 𝑈𝐾 ∼ Unif
ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0
𝑋𝑁+𝑘 = 𝜒𝑁+1𝐾−𝑘+1
“Recycling” the Final Set of Particles
… , 𝑋𝑁+𝑘 , … ~ 𝑈 𝐾−𝑘+1 𝑋𝑁
𝑈1, … , 𝑈𝐾 ∼ Unif
ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0
𝑋𝑁+𝑘 = 𝜒𝑁+1𝐾−𝑘+1
“Recycling” the Final Set of Particles
… , 𝑋𝑁+𝑘 , … ~ 𝑈 𝐾−𝑘+1 𝑋𝑁
𝑈1, … , 𝑈𝐾 ∼ Unif
ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0
𝑋𝑁+𝑘 = 𝜒𝑁+1𝐾−𝑘+1
Rényi Representation
“Recycling” the Final Set of Particles
ln 𝔼 𝑋𝑁 =
𝑖=1
𝑁
ln𝐾
𝐾 + 1
“Recycling” the Final Set of Particles
ln 𝔼 𝑋𝑁+𝑘 =
𝑖=1
𝑁
ln𝐾
𝐾 + 1+ ln
𝐾 − 𝑘 + 1
𝐾 + 1
“Recycling” the Final Set of Particles
ln 𝔼 𝑋𝑁+𝑘 =
𝑖=1
𝑁
ln𝐾
𝐾 + 1+
𝑗=1
𝑘
ln𝐾 − 𝑗 + 1
𝐾 − 𝑗 + 2
“Recycling” the Final Set of Particles
ln 𝔼 𝑋𝑁+𝑘 =
𝑖=1
𝑁
ln𝐾
𝐾 + 1+
𝑗=1
𝑘
ln𝐾 − 𝑗 + 1
𝐾 − 𝑗 + 2
Exponential Shrinkage Uniform Shrinkage
Nested SamplingErrors
Nested Sampling Uncertainties
Pictures from this 2010 talk by Skilling.
Nested Sampling Uncertainties
Pictures from this 2010 talk by Skilling.
Nested Sampling Uncertainties
Pictures from this 2010 talk by Skilling.
Statistical
uncertainties
• Unknown prior
volumes
Nested Sampling Uncertainties
Pictures from this 2010 talk by Skilling.
Statistical
uncertaintiesSampling
uncertainties
• Unknown prior
volumes• Number of samples
(counting)
• Discrete point
estimates for
contours
• Particle path
dependencies
Sampling Error: Poisson Uncertainties
𝕍 𝑁 = 𝔼 𝑁 ∼? ? ?
𝔼 Δ ln 𝑋
Based on Skilling (2006)
and Keeton (2011)
“Distance” from prior to posterior
Sampling Error: Poisson Uncertainties
𝐻 𝜋 𝑝 ≡ Ω𝚯
𝑝 𝚯 ln𝑝 𝚯
𝜋 𝚯𝑑𝚯
Based on Skilling (2006)
and Keeton (2011)
Kullback-Leibler divergence from
𝜋 to 𝑝 “information gained”.
Sampling Error: Poisson Uncertainties
𝕍 𝑁 = 𝔼 𝑁 ∼𝐻 𝜋 𝑝
𝔼 Δ ln 𝑋
Based on Skilling (2006)
and Keeton (2011)
Sampling Error: Poisson Uncertainties
𝕍 𝑁 = 𝔼 𝑁 ∼𝐻 𝜋 𝑝
𝔼 Δ ln 𝑋
Based on Skilling (2006)
and Keeton (2011)
𝕍 ln 𝒵
Sampling Error: Poisson Uncertainties
𝕍 𝑁 = 𝔼 𝑁 ∼𝐻 𝜋 𝑝
𝔼 Δ ln 𝑋
Based on Skilling (2006)
and Keeton (2011)
𝕍 ln 𝒵 ∼ 𝕍 ln 𝑋𝐻
Sampling Error: Poisson Uncertainties
𝕍 𝑁 = 𝔼 𝑁 ∼𝐻 𝜋 𝑝
𝔼 Δ ln 𝑋
Based on Skilling (2006)
and Keeton (2011)
𝕍 ln 𝒵 ∼ 𝕍 ln 𝑋𝐻 ∼ 𝜎 𝑁 𝔼 Δ ln 𝑋 2
Sampling Error: Poisson Uncertainties
𝕍 𝑁 = 𝔼 𝑁 ∼𝐻 𝜋 𝑝
𝔼 Δ ln 𝑋
Based on Skilling (2006)
and Keeton (2011)
𝕍 ln 𝒵 ∼ 𝕍 ln 𝑋𝐻 ∼ 𝜎 𝑁 𝔼 Δ ln 𝑋 2
∼ 𝐻 𝜋 𝑝 𝔼 Δ ln 𝑋
Sampling Error: Poisson Uncertainties
𝕍 𝑁 = 𝔼 𝑁 ∼𝐻 𝜋 𝑝
𝔼 Δ ln 𝑋
Based on Skilling (2006)
and Keeton (2011)
𝕍 ln 𝒵 ∼ 𝕍 ln 𝑋𝐻 ∼ 𝜎 𝑁 𝔼 Δ ln 𝑋 2
∼
𝑖=1
𝑁+𝐾
Δ𝐻𝑖𝔼 Δ ln 𝑋𝑖
= 1/𝐾
Sampling Error: Monte Carlo Noise
Formalism following Higson et al.
(2017) and Chopin and Robert (2010)
𝔼𝑝 𝑓 𝚯 = Ω𝚯
𝑓 𝚯 𝑝 𝚯 𝑑𝚯 =1
𝒵 0
1
𝑓 X ℒ 𝑋 𝑑𝑋
Sampling Error: Monte Carlo Noise
𝔼𝑝 𝑓 𝚯 = Ω𝚯
𝑓 𝚯 𝑝 𝚯 𝑑𝚯 =1
𝒵 0
1
𝑓 X ℒ 𝑋 𝑑𝑋
Formalism following Higson et al.
(2017) and Chopin and Robert (2010)
𝑓 X = 𝔼𝜋 𝑓 𝚯 ℒ 𝚯 = ℒ 𝑋
Sampling Error: Monte Carlo Noise
𝔼𝑝 𝑓 𝚯 = Ω𝚯
𝑓 𝚯 𝑝 𝚯 𝑑𝚯 =1
𝒵 0
1
𝑓 X ℒ 𝑋 𝑑𝑋
Formalism following Higson et al.
(2017) and Chopin and Robert (2010)
𝑓 X = 𝔼𝜋 𝑓 𝚯 ℒ 𝚯 = ℒ 𝑋
Sampling Error: Monte Carlo Noise
𝔼𝑝 𝑓 𝚯 = Ω𝚯
𝑓 𝚯 𝑝 𝚯 𝑑𝚯 =1
𝒵 0
1
𝑓 X ℒ 𝑋 𝑑𝑋
Formalism following Higson et al.
(2017) and Chopin and Robert (2010)
𝑓 X = 𝔼𝜋 𝑓 𝚯 ℒ 𝚯 = ℒ 𝑋
Sampling Error: Monte Carlo Noise
Formalism following Higson et al.
(2017) and Chopin and Robert (2010)
≈
𝑖=1
𝑁+𝐾
𝑝𝑖 𝑓 𝚯𝑖
𝔼𝑝 𝑓 𝚯 = Ω𝚯
𝑓 𝚯 𝑝 𝚯 𝑑𝚯 =1
𝒵 0
1
𝑓 X ℒ 𝑋 𝑑𝑋
Exploring Sampling Uncertainties
One run with K “live points”
= K runs with 1 live point!
ℒ𝑁 > ℒ𝑁−1 > ⋯> ℒ2 > ℒ1 > 0
Exploring Sampling Uncertainties
One run with K “live points”
= K runs with 1 live point!
ℒ𝑁1
1> ⋯ > ℒ2
1> ℒ1
1> 0
ℒ𝑁2
2> ⋯ > ℒ2
2> ℒ1
2> 0
Exploring Sampling Uncertainties
One run with K “live points”
= K runs with 1 live point!
ℒ 𝑖∙
= ℒ 𝑖1
, ℒ 𝑖2
, …
Original run
“strand”
“strand”
Exploring Sampling Uncertainties
ℒ 𝑖∙
= ℒ 𝑖1
, ℒ 𝑖2
, …
Original run
“strand”
“strand”
Exploring Sampling Uncertainties
ℒ 𝑖∙
= ℒ 𝑖1
, ℒ 𝑖2
, …
Original run
“strand”
“strand”
ℒ 𝑖∙
′= ℒ 𝑖
1′, ℒ 𝑖
2′, …
We would like to sample K paths from
the set of all possible paths P ℒ 𝑖 , … .
However, we don’t have access to it.
Exploring Sampling Uncertainties
ℒ 𝑖∙
= ℒ 𝑖1
, ℒ 𝑖2
, …
Original run
“strand”
“strand”
We would like to sample K paths from
the set of all possible paths P ℒ 𝑖 , … .
However, we don’t have access to it.Use bootstrap estimator.
ℒ 𝑖
∙′= ℒ 𝑖
1, ℒ 𝑖
1, ℒ 𝑖
2, …
Exploring Sampling Uncertainties
ℒ 𝑖∙
= ℒ 𝑖1
, ℒ 𝑖2
, …
Original run
“strand”
“strand”
𝒵′ ≈
𝑖=1
𝑁′
𝑤𝑖′ 𝑝𝑖
′ = 𝑤𝑖
′
𝒵′
We would like to sample K paths from
the set of all possible paths P ℒ 𝑖 , … .
However, we don’t have access to it.Use bootstrap estimator.
Nested SamplingIn Practice
Method 0: Sampling from the Prior
Higson et al. (2017)
arxiv:1704.03459
Method 0: Sampling from the Prior
Higson et al. (2017)
arxiv:1704.03459
Sampling from the prior
becomes exponentially
more inefficient as time
goes on.
Method 1: Constrained Uniform Sampling
Feroz et al. (2009)
Proposal:
Bound the iso-likelihood contours in real
time and sample from the newly
constrained prior.
Method 1: Constrained Uniform Sampling
Feroz et al. (2009)
Issues:
• How to ensure bounds always
encompass iso-likelihood contours?
• How to generate flexible bounds?
Method 1: Constrained Uniform Sampling
Feroz et al. (2009)
Issues:
• How to ensure bounds always
encompass iso-likelihood contours?
• How to generate flexible bounds?
Bootstrapping.
Easier with uniform
(transformed) prior.
Method 1: Constrained Uniform Sampling
Feroz et al. (2009)
Ellipsoids Balls/Cubes
Buchner (2014)
MultiNest
Method 2: “Evolving” Previous Samples
Proposal:
Generate independent samples subject to the likelihood
constraint by “evolving” copies of current live points.
Method 2: “Evolving” Previous Samples
Proposal:
Generate independent samples subject to the likelihood
constraint by “evolving” copies of current live points.
Method 2: “Evolving” Previous Samples
Proposal:
Generate independent samples subject to the likelihood
constraint by “evolving” copies of current live points.
• Random walks (i.e. MCMC)
• Slice sampling
• Random trajectories (i.e. HMC)
PolyChord
Method 2: “Evolving” Previous SamplesIssues:
• How to ensure samples are independent (thinning) and
properly distributed within likelihood constraint?
• How to generate efficient proposals?
• Random walks (i.e. MCMC)
• Slice sampling
• Random trajectories (i.e. HMC)
PolyChord
Example: Gaussian Shells
Feroz et al. (2013)
Example: Eggbox
Feroz et al. (2013)
Summary: (Static) Nested Sampling
Summary: (Static) Nested Sampling
1. Estimates the evidence 𝒵.
Summary: (Static) Nested Sampling
1. Estimates the evidence 𝒵.
2. Estimates the posterior 𝑝 𝚯 .
Summary: (Static) Nested Sampling
1. Estimates the evidence 𝒵.
2. Estimates the posterior 𝑝 𝚯 .
3. Possesses well-defined stopping criteria.
Summary: (Static) Nested Sampling
1. Estimates the evidence 𝒵.
2. Estimates the posterior 𝑝 𝚯 .
3. Possesses well-defined stopping criteria.
4. Combining runs improves inference.
Summary: (Static) Nested Sampling
1. Estimates the evidence 𝒵.
2. Estimates the posterior 𝑝 𝚯 .
3. Possesses well-defined stopping criteria.
4. Combining runs improves inference.
5. Sampling and statistical
uncertainties can be simulated
from a single run.
Dynamic NestedSampling
Dynamic Nested Sampling
ℒmax1
> ℒ𝑁1
1> ⋯ > ℒ2
1> ℒ1
1> ℒmin
1
ℒ𝑁2
2> ⋯ > ℒ2
2> ℒ1
2> 0
Dynamic Nested Sampling
ℒ𝑁2
2> ⋯ > ℒ𝑁1
1… > ℒ1
1
> ℒ22
> ℒ12
> 0
𝑁2 live points
𝑁2 live points 𝑁1 + 𝑁2 live points
Dynamic Nested Sampling
ℒ𝑁2
2> ⋯ > ℒ𝑁1
1… > ℒ1
1
> ℒ22
> ℒ12
> 0
𝑁2 live points
𝑁2 live points 𝑁1 + 𝑁2 live pointsln 𝑋𝑖+1 =
𝑗=1
𝑖
ln 𝑇𝑗
Dynamic Nested Sampling
𝑁2 live points
𝑁2 live points 𝑁1 + 𝑁2 live pointsln 𝑋𝑖+1 =
𝑗=1
𝑖
ln 𝑇𝑗
𝐾𝑖+1 ≥ 𝐾𝑖:𝑇𝑖+1 ∼ Beta(𝐾𝑖+1, 1)
Constant/Increasingℒ𝑁2
2> ⋯ > ℒ𝑁1
1… > ℒ1
1
> ℒ22
> ℒ12
> 0
Dynamic Nested Sampling
ℒ𝑁2
2> ⋯ > ℒ𝑁1
1… > ℒ1
1
> ℒ22
> ℒ12
> 0
𝑁2 live points
𝑁2 live points 𝑁1 + 𝑁2 live pointsln 𝑋𝑖+1 =
𝑗=1
𝑖
ln 𝑇𝑗
𝐾𝑖+1 ≥ 𝐾𝑖:𝑇𝑖+1 ∼ Beta(𝐾𝑖+1, 1)
𝐾𝑖+𝑗>𝑖 < 𝐾𝑖:
𝑇𝑖+1, … 𝑇𝑖+𝑗 ∼
𝑈 𝐾𝑖, … , 𝑈 𝐾𝑖−𝑗+1
Constant/Increasing
Decreasing sequence
Static Nested Sampling
Exponential
shrinkage
Uniform
shrinkage
ln 𝔼 𝑋𝑁+𝑘 =
𝑖=1
𝑁
ln𝐾
𝐾 + 1+
𝑗=1
𝑘
ln𝐾 − 𝑗 + 1
𝐾 − 𝑗 + 2
Dynamic Nested Sampling
ln 𝔼 𝑋𝑁 =
𝑖=1
𝑛1
ln𝐾𝑖
𝐾𝑖 + 1+
𝑗=1
𝑛2
ln𝐾𝑛1
− 𝑗 + 1
𝐾𝑛1− 𝑗 + 2
+ ⋯ +
𝑗=1
𝐾final
ln𝐾final − 𝑗 + 1
𝐾final − 𝑗 + 2
Exponential
shrinkage
Uniform
shrinkage
Benefits of Dynamic Nested Sampling
• Can accommodate new “strands” within a
particular range of prior volumes without
changing overall statistical framework.
• Particles can be adaptively added until stopping
criteria are reached, allowing targeted
estimation.
Benefits of Dynamic Nested Sampling
• Can accommodate new “strands” within a
particular range of prior volumes without
changing overall statistical framework.
• Particles can be adaptively added until stopping
criteria are reached, allowing targeted
estimation.
Allocating Samples
Higson et al. (2017)
arxiv:1704.03459
𝐼𝒵 𝑖 ∼𝔼 𝒵>𝑖
𝐾𝑖
𝐼𝑃 𝑖 ∼ 𝑝𝑖
𝐼 𝐺, 𝑖 = 𝐺𝐼𝑃 𝑖
𝑗 𝐼𝑃 𝑗+ 1 − 𝐺
𝐼𝒵 𝑖
𝑗 𝐼𝒵 𝑗
Posterior weight Evidence weight
Weight Function
Allocating Samples
3-D correlated multivariate Normal
Static Dynamic
(posterior)
Dynamic
(evidence)
How Many Samples is Enough?
How Many Samples is Enough?
• Ill-posed question: depends on application!
How Many Samples is Enough?
• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?
How Many Samples is Enough?
• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?
Assume general case: we want D-dimensional
𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.
“True” posterior
constructed over
same “domain”.
How Many Samples is Enough?
• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?
Assume general case: we want D-dimensional
𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.
𝐻 𝑝 𝑝 ≡ Ω𝚯
𝑝 𝚯 ln 𝑝 𝚯
𝑝 𝚯𝑑𝚯
How Many Samples is Enough?
• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?
Assume general case: we want D-dimensional
𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.
𝐻 𝑝 𝑝 =
𝑖=1
𝑁
𝑝𝑖 ln 𝑝𝑖 𝑝𝑖
How Many Samples is Enough?
• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?
Assume general case: we want D-dimensional
𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.
𝐻 𝑝 𝑝 =
𝑖=1
𝑁
𝑝𝑖 ln 𝑝𝑖 𝑝𝑖
We want access to P 𝑝𝑖 𝑝 , but we don’t know 𝑝 𝚯 .
How Many Samples is Enough?
• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?
Assume general case: we want D-dimensional
𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.
𝐻 𝑝 𝑝′ =
𝑖=1
𝑁
𝑝′𝑖ln 𝑝′
𝑖 𝑝𝑖
We want access to Pr 𝑝𝑖 𝑝 𝚯 , but we don’t know 𝑝 𝚯 .
Use bootstrap estimator.
How Many Samples is Enough?
• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?
Assume general case: we want D-dimensional
𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.
𝐻 𝑝 𝑝′ =
𝑖=1
𝑁
𝑝′𝑖ln 𝑝′
𝑖 𝑝𝑖
We want access to Pr 𝑝𝑖 𝑝 𝚯 , but we don’t know 𝑝 𝚯 .
Use bootstrap estimator.
Random
variable
How Many Samples is Enough?
• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?
Assume general case: we want D-dimensional
𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.
Possible stopping criterion: fractional (%) variation in H.
We want access to Pr 𝑝𝑖 𝑝 𝚯 , but we don’t know 𝑝 𝚯 .
Use bootstrap estimator.
Dynamic Nested Sampling Summary1. Can sample from multi-modal distributions.
2. Can simultaneously estimate the evidence 𝒵 and
posterior 𝑝 𝚯 .
3. Combining independent runs improves inference
(“trivially parallelizable”).
4. Can simulate uncertainties (sampling and statistical) from
a single run.
5. Enables adaptive sample allocation during runtime using
arbitrary weight functions.
6. Possesses evidence/posterior-based stopping criteria.
Examples andApplications
Dynamic Nested Sampling with dynesty
dynesty.readthedocs.io
• Pure Python.
• Easy to use.
• Modular.
• Open source.
• Parallelizable.
• Flexible bounding/sampling methods.
• Thorough documentation!
Example:
Example: Linear Regression (Posterior)
𝚯 = {𝑎, 𝑏, ln 𝑓}
Example: Linear Regression (Posterior)
Corner Plot
Example: Linear Regression (Posterior)
Corner Plot
Trace Plot
Example: Multivariate Normal (Evidence)
𝚯 = 𝑥1, 𝑥2, 𝑥3
Example: Multivariate Normal (Evidence)
“Summary” Plot
Example: Multivariate Normal (Errors)
StaticDynamic
“Summary” Plot
Application:
Application:
• All results are preliminary but agree with results
from MCMC methods (derived using emcee).
• Samples allocated with 100% posterior weight,
automated stopping criterion (2% fractional error
in simulated KLD).
• dynesty was substantially (~3-6x) more efficient
at generating good samples than emcee, before
thinning.
Application: Modeling Galaxy SEDs
𝚯 = ln 𝑀∗ , ln 𝑍 , 𝝈5, 𝜹6, 𝜶2 D=15
With: Joel Leja, Ben Johnson, Charlie Conroy
Application: Modeling Galaxy SEDs
With: Joel Leja, Ben Johnson, Charlie Conroy
Application: Modeling Galaxy SEDs
Fig: Joel Leja
With: Joel Leja, Ben Johnson, Charlie Conroy
Application: Supernovae Light Curves
Fig: Open Supernova Catalog (LSQ12dlf), James Guillochon
With: James Guillochon, Kaisey Mandel
𝚯 = 𝑡𝑒 , 𝝆4, 𝝐3, 𝜹3, 𝜎2 D=12
Application: Supernovae Light Curves
With: James Guillochon, Kaisey Mandel
Application: Molecular Cloud Distances
With: Catherine Zucker, Doug Finkbeiner, Alyssa Goodman
𝚯 = 𝒇2, 𝒅5, 𝒄5, 𝑝𝑜 D=13
Application: Molecular Cloud Distances
With: Catherine Zucker, Doug Finkbeiner, Alyssa Goodman
Dynamic Nested Sampling with dynesty
dynesty.readthedocs.io
• Pure Python.
• Easy to use.
• Modular.
• Open source.
• Parallelizable.
• Flexible bounding/sampling methods.
• Thorough documentation!