an introduction to nested samplinghea-2017/09/26 · an introduction to (dynamic) nested sampling...

An Introduction to (Dynamic) Nested Sampling

Josh Speagle

[email protected]

mailto:[email protected]

Introduction

Background

Pr 𝚯 𝐃, M =Pr 𝐃 𝚯, M Pr 𝚯|M

Pr 𝐃 M

Bayes’ Theorem

Background

Pr 𝚯 𝐃, M =Pr 𝐃 𝚯, M Pr 𝚯|M

Pr 𝐃 M

Bayes’ Theorem

PriorLikelihoodPosterior

Evidence

Background

𝑝 𝚯 =ℒ 𝚯 𝜋 𝚯

𝒵

Bayes’ Theorem

≡ Ω𝚯

ℒ 𝚯 𝜋 𝚯 𝑑𝚯

PosteriorLikelihood Prior

Evidence

Posterior Estimation via Sampling

𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯Samples


𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯

𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯

=

𝑖=1

𝑁

𝑝𝑖𝛿𝐷 𝚯 − 𝚯i

Samples

Weights


𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯

𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯

=

𝑖=1

𝑁


Samples

Weights

1, 1, … , 1, 1MCMC


𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯

𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯

=

𝑖=1

𝑁


Samples

Weights

1, 1, … , 1, 1MCMC𝑝𝑁𝑞𝑁

, 𝑝𝑁−1𝑞𝑁−1

, … , 𝑝2𝑞2

, 𝑝1𝑞1

Importance

Sampling


𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯

𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯

=

𝑖=1

𝑁


Samples

Weights

Nested Sampling?

Motivation: Integrating the Posterior

Ω𝚯

𝑝 𝚯 𝑑𝚯


{𝚯: 𝑝 𝚯 =𝜆}

𝜆𝑑𝑉 𝜆

𝑝 𝑉𝑖 = 𝜆𝑖

𝑑𝑉𝑖


0

∞

𝑝 𝑉 𝑑𝑉 𝜆


𝑑𝑉𝑖


0

∞

𝑝 𝑉 𝑑𝑉 𝜆“Amplitude”

“Volume”


𝑑𝑉𝑖


0

∞



𝑑𝑉𝑖

“Typical Set”


0

∞


𝑑𝑉 ∝ 𝑟𝐷−1

“Typical Set”


Ω𝚯

𝑝 𝚯 𝑑𝚯


𝒵 ≡ Ω𝚯



𝒵 ≡ Ω𝚯


𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)


𝒵 = 0

∞

𝑋 𝜆 𝑑𝜆


𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)


𝒵 = 0

1

ℒ 𝑋 𝑑𝑋


𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)


𝒵 ≈

𝑖=1

𝑛

𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …


𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , … can be

rectangles, trapezoids, etc.

Amplitude

Differential

Volume Δ𝑋𝑖


𝒵 ≈

𝑖=1

𝑛

𝑤𝑖


𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …


𝑝𝑖 = 𝑤𝑖

𝒵

We get posteriors “for free”

𝒵 ≈

𝑖=1

𝑛

𝑤𝑖

Importance Weight

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …


𝒵 ≈

𝑖=1

𝑛

𝑤𝑖

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

Directly proportional to

typical set. 𝑝𝑖 =

𝑤𝑖

𝒵

We get posteriors “for free”

Importance Weight

~ 𝑓 𝑝𝑖 , … 𝑔 𝑉𝑖 , …

Motivation: Sampling the Posterior

Sampling directly from the

likelihood ℒ 𝚯 is hard.

Pictures from this 2010 talk by Skilling.

http://www.lss.supelec.fr/MaxEnt2010/slide/Tutorial4_Skilling.pdf


Sampling uniformly within

bound ℒ 𝚯 > 𝜆 is easier.







𝑋𝑖−1






𝑋𝑖𝑋𝑖−1






𝑋𝑖+1𝑋𝑖






𝑋𝑖+1𝑋𝑖

MCMC: Solving a Hard Problem once.

vs

Nested Sampling: Solving an Easier

Problem many times.


How Nested Sampling Works

Estimating the Prior Volume

𝒵 ≈

𝑖=1

𝑛

𝑤𝑖


𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …


𝒵 ≈

𝑖=1

𝑛

𝑤𝑖


𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

???


𝒵 ≈

𝑖=1

𝑛

𝑤𝑖


𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

???

𝑋~𝑓 𝑥𝐹 𝑋 ~ Unif

PDF

CDF

Probability Integral

Transform


𝒵 ≈

𝑖=1

𝑛

𝑤𝑖


𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

???

𝑋~𝑓 𝑥𝐹 𝑋 ~ Unif

PDF

CDF

Probability Integral

Transform Need to sample from the constrained prior.



𝑋𝑖+1𝑋𝑖

Posterior




𝑋𝑖+1𝑋𝑖

𝑋𝑖+1 = 𝑈𝑖𝑋𝑖

𝑈𝑖 ~ Unif

Posterior



𝑋𝑖+1𝑋𝑖

𝑋𝑖+1 =

𝑗=0

𝑖

𝑈𝑗 𝑋0

𝑈0, … , 𝑈𝑖~ Unifi. i. d.


Posterior



𝑋𝑖+1𝑋𝑖

𝑋𝑖+1 =

𝑗=0

𝑖

𝑈𝑗 𝑋0

𝑈0, … , 𝑈𝑖~ Unif

𝑋0 ≡ 1

i. i. d.


Posterior



𝑋𝑖+1𝑋𝑖

𝑋𝑖+1 =

𝑗=0

𝑖

𝑈𝑗

𝑈0, … , 𝑈𝑖~ Unifi. i. d.

𝑋0 ≡ 1


Posterior



𝒵 ≈

𝑖=1

𝑛

𝑤𝑖


𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

???


𝒵 ≈

𝑖=1

𝑛

𝑤𝑖


𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

Pr 𝑋1, 𝑋2, … , 𝑋𝑖where 1st & 2nd moments

can be computed

Nested Sampling Algorithm

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0𝚯1𝚯2𝚯𝑁−1𝚯𝑁

Nested Sampling Algorithm (Ideal)

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

Samples sequentially drawn

from constrained prior 𝜋ℒ>ℒ𝑖𝚯 .

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0𝚯1𝚯2𝚯𝑁−1𝚯𝑁

Nested Sampling Algorithm (Naïve)

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

𝚯1 ~ 𝜋 𝚯

1. Samples sequentially drawn from prior 𝜋 𝚯 .

2. New point only accepted if ℒ𝑖+1 > ℒ𝑖.


ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

𝚯1 ~ 𝜋 𝚯𝚯2



ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0


ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

𝚯1 ~ 𝜋 𝚯𝚯2𝚯𝑁−1𝚯𝑁



Building a NestedSampling Algorithm

Components of the Algorithm

1. Adding more particles.

2. Knowing when to stop.

3. What to do after stopping.

Adding More Particles

ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

1> 0

Skilling (2006)


ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

1> 0

Skilling (2006)

𝑋𝑖+11

=

𝑗=0

𝑖

𝑈𝑗


ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

1> 0

ℒ𝑁2

2> ⋯ > ℒ2

2> ℒ1

2> 0

Skilling (2006)


ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

1> 0

ℒ𝑁2

2> ⋯ > ℒ2

2> ℒ1

2> 0

Skilling (2006)

𝑋𝑖+11

=

𝑗=0

𝑖

𝑈𝑗 𝑋𝑗+12

=

𝑘=0

𝑗

𝑈𝑘


ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

1

> ℒ22

> ℒ12

> ℒ11

> 0

Skilling (2006)


ln 𝑋𝑖+1 =

𝑗=0

𝑖

ln 𝑈𝑗

𝑈0, … , 𝑈𝑖~ Unif

Skilling (2006)


ln 𝑋𝑖+1 =

𝑗=0

𝑖

ln 𝑇𝑗

𝑇0, … , 𝑇𝑖~ 𝑈 2

𝑈1, 𝑈2 ~ Unif ⇒ 𝑈 1 , 𝑈 2

Order Statistics

Skilling (2006)


ln 𝑋𝑖+1 =

𝑗=0

𝑖

ln 𝑇𝑗

𝑇0, … , 𝑇𝑖~ Beta 2,1

Skilling (2006)

𝑈1, 𝑈2 ~ Unif ⇒ 𝑈 1 , 𝑈 2

Order Statistics


ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

1

> ℒ22

> ℒ12

> ℒ11

> 0

Skilling (2006)


ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

1

> ℒ22

> ℒ12

> ℒ11

> 0

One run with 2 “live points”

= 2 runs with 1 live point

Skilling (2006)

live points

dead points


ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

1

> ℒ22

> ℒ12

> ℒ11

> 0

One run with K “live points”

= K runs with 1 live point

Skilling (2006)

live points

dead points


ln 𝔼 𝑋𝑖+1 =

𝑗=0

𝑖

ln 𝔼 𝑇𝑗

𝑇0, … , 𝑇𝑖~ Beta 2,1

Skilling (2006)

𝑇1, 𝑇2 ~ Unif ⇒ 𝑈 1 , 𝑈 2

Order Statisticslive points

dead points


𝑇0, … , 𝑇𝑖~ Beta K, 1

Skilling (2006)

𝑈1, … , 𝑈𝐾 ~ Unif ⇒ 𝑈 1 , … , 𝑈 𝐾

Order Statistics


𝑗=0

𝑖

ln 𝔼 𝑇𝑗


𝑇0, … , 𝑇𝑖~ Beta K, 1

Skilling (2006)


𝑗=0

𝑖

ln𝐾

𝐾 + 1

𝑈1, … , 𝑈𝐾 ~ Unif ⇒ 𝑈 1 , … , 𝑈 𝐾

Order Statistics

Stopping Criteria

𝒵 = 𝒵𝑖 + 𝒵in

Stopping Criteria

𝒵𝑖

𝒵𝑖 + 𝒵in

> 𝑓

Stopping Criteria

𝒵 =

𝑗=1

𝑖

𝑤𝑗 + 𝒵in

Stopping Criteria

𝒵 ≤

𝑗=1

𝑖

𝑤𝑗 + ℒ K 𝑋𝑛

Uniform slab

Maximum

Likelihood

Stopping Criteria

𝒵 ≲

𝑗=1

𝑖

𝑤𝑗 + ℒ K 𝑋𝑛

Uniform slab

Maximum

Likelihood

“Recycling” the Final Set of Particles


… , 𝜒𝑁+1𝑘

, … ~ 𝑈𝑘𝑋𝑁

𝑈1, … , 𝑈𝐾 ∼ Unif


… , 𝜒𝑁+1𝑘



ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0


… , 𝜒𝑁+1𝑘



ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0


… , 𝜒𝑁+1𝑘



ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

𝑋𝑁+𝑘 = 𝜒𝑁+1𝐾−𝑘+1


… , 𝑋𝑁+𝑘 , … ~ 𝑈 𝐾−𝑘+1 𝑋𝑁


ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0



… , 𝑋𝑁+𝑘 , … ~ 𝑈 𝐾−𝑘+1 𝑋𝑁


ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0


Rényi Representation


ln 𝔼 𝑋𝑁 =

𝑖=1

𝑁

ln𝐾

𝐾 + 1


ln 𝔼 𝑋𝑁+𝑘 =

𝑖=1

𝑁

ln𝐾

𝐾 + 1+ ln

𝐾 − 𝑘 + 1

𝐾 + 1



𝑖=1

𝑁

ln𝐾

𝐾 + 1+

𝑗=1

𝑘

ln𝐾 − 𝑗 + 1

𝐾 − 𝑗 + 2



𝑖=1

𝑁

ln𝐾

𝐾 + 1+

𝑗=1

𝑘

ln𝐾 − 𝑗 + 1

𝐾 − 𝑗 + 2

Exponential Shrinkage Uniform Shrinkage

Nested SamplingErrors

Nested Sampling Uncertainties





Statistical

uncertainties

• Unknown prior

volumes




Statistical

uncertaintiesSampling

uncertainties

• Unknown prior

volumes• Number of samples

(counting)

• Discrete point

estimates for

contours

• Particle path

dependencies


Sampling Error: Poisson Uncertainties

𝕍 𝑁 = 𝔼 𝑁 ∼? ? ?

𝔼 Δ ln 𝑋

Based on Skilling (2006)

and Keeton (2011)

“Distance” from prior to posterior


𝐻 𝜋 𝑝 ≡ Ω𝚯

𝑝 𝚯 ln𝑝 𝚯

𝜋 𝚯𝑑𝚯


and Keeton (2011)

Kullback-Leibler divergence from

𝜋 to 𝑝 “information gained”.


𝕍 𝑁 = 𝔼 𝑁 ∼𝐻 𝜋 𝑝

𝔼 Δ ln 𝑋


and Keeton (2011)



𝔼 Δ ln 𝑋


and Keeton (2011)

𝕍 ln 𝒵



𝔼 Δ ln 𝑋


and Keeton (2011)

𝕍 ln 𝒵 ∼ 𝕍 ln 𝑋𝐻



𝔼 Δ ln 𝑋


and Keeton (2011)

𝕍 ln 𝒵 ∼ 𝕍 ln 𝑋𝐻 ∼ 𝜎 𝑁 𝔼 Δ ln 𝑋 2



𝔼 Δ ln 𝑋


and Keeton (2011)


∼ 𝐻 𝜋 𝑝 𝔼 Δ ln 𝑋



𝔼 Δ ln 𝑋


and Keeton (2011)


∼

𝑖=1

𝑁+𝐾

Δ𝐻𝑖𝔼 Δ ln 𝑋𝑖

= 1/𝐾

Sampling Error: Monte Carlo Noise

Formalism following Higson et al.

(2017) and Chopin and Robert (2010)

𝔼𝑝 𝑓 𝚯 = Ω𝚯

𝑓 𝚯 𝑝 𝚯 𝑑𝚯 =1

𝒵 0

1

𝑓 X ℒ 𝑋 𝑑𝑋




𝒵 0

1




𝑓 X = 𝔼𝜋 𝑓 𝚯 ℒ 𝚯 = ℒ 𝑋




≈

𝑖=1

𝑁+𝐾

𝑝𝑖 𝑓 𝚯𝑖



𝒵 0

1


Exploring Sampling Uncertainties


= K runs with 1 live point!

ℒ𝑁 > ℒ𝑁−1 > ⋯> ℒ2 > ℒ1 > 0




ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

1> 0

ℒ𝑁2

2> ⋯ > ℒ2

2> ℒ1

2> 0




ℒ 𝑖∙

= ℒ 𝑖1

, ℒ 𝑖2

, …

Original run

“strand”

“strand”


ℒ 𝑖∙

= ℒ 𝑖1

, ℒ 𝑖2

, …

Original run

“strand”

“strand”

ℒ 𝑖∙

′= ℒ 𝑖

1′, ℒ 𝑖

2′, …

We would like to sample K paths from

the set of all possible paths P ℒ 𝑖 , … .

However, we don’t have access to it.


ℒ 𝑖∙

= ℒ 𝑖1

, ℒ 𝑖2

, …

Original run

“strand”

“strand”



However, we don’t have access to it.Use bootstrap estimator.

ℒ 𝑖

∙′= ℒ 𝑖

1, ℒ 𝑖

1, ℒ 𝑖

2, …


ℒ 𝑖∙

= ℒ 𝑖1

, ℒ 𝑖2

, …

Original run

“strand”

“strand”

𝒵′ ≈

𝑖=1

𝑁′

𝑤𝑖′ 𝑝𝑖

′ = 𝑤𝑖

′

𝒵′



However, we don’t have access to it.Use bootstrap estimator.

Nested SamplingIn Practice

Nested Sampling In Practice

Higson et al. (2017)

arxiv:1704.03459

https://arxiv.org/abs/1704.03459

Method 0: Sampling from the Prior


arxiv:1704.03459


Method 0: Sampling from the Prior


arxiv:1704.03459

Sampling from the prior

becomes exponentially

more inefficient as time

goes on.


Method 1: Constrained Uniform Sampling

Feroz et al. (2009)

Proposal:

Bound the iso-likelihood contours in real

time and sample from the newly

constrained prior.


Feroz et al. (2009)

Issues:

• How to ensure bounds always

encompass iso-likelihood contours?

• How to generate flexible bounds?


Feroz et al. (2009)

Issues:

• How to ensure bounds always

encompass iso-likelihood contours?

• How to generate flexible bounds?

Bootstrapping.

Easier with uniform

(transformed) prior.


Feroz et al. (2009)

Ellipsoids Balls/Cubes

Buchner (2014)

MultiNest

https://ccpforge.cse.rl.ac.uk/gf/project/multinest/

Method 2: “Evolving” Previous Samples

Proposal:

Generate independent samples subject to the likelihood

constraint by “evolving” copies of current live points.

Method 2: “Evolving” Previous Samples

Proposal:

Generate independent samples subject to the likelihood

constraint by “evolving” copies of current live points.

• Random walks (i.e. MCMC)

• Slice sampling

• Random trajectories (i.e. HMC)

PolyChord

https://ccpforge.cse.rl.ac.uk/gf/project/polychord/

Method 2: “Evolving” Previous SamplesIssues:

• How to ensure samples are independent (thinning) and

properly distributed within likelihood constraint?

• How to generate efficient proposals?

• Random walks (i.e. MCMC)

• Slice sampling

• Random trajectories (i.e. HMC)

PolyChord

https://ccpforge.cse.rl.ac.uk/gf/project/polychord/

Example: Gaussian Shells

Feroz et al. (2013)

Example: Eggbox

Feroz et al. (2013)

Summary: (Static) Nested Sampling


1. Estimates the evidence 𝒵.



2. Estimates the posterior 𝑝 𝚯 .




3. Possesses well-defined stopping criteria.





4. Combining runs improves inference.





4. Combining runs improves inference.

5. Sampling and statistical

uncertainties can be simulated

from a single run.

Dynamic NestedSampling

Dynamic Nested Sampling


arxiv:1704.03459



ℒmax1

> ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

1> ℒmin

1

ℒ𝑁2

2> ⋯ > ℒ2

2> ℒ1

2> 0


ℒ𝑁2

2> ⋯ > ℒ𝑁1

1… > ℒ1

1

> ℒ22

> ℒ12

> 0

𝑁2 live points

𝑁2 live points 𝑁1 + 𝑁2 live points


ℒ𝑁2

2> ⋯ > ℒ𝑁1

1… > ℒ1

1

> ℒ22

> ℒ12

> 0

𝑁2 live points

𝑁2 live points 𝑁1 + 𝑁2 live pointsln 𝑋𝑖+1 =

𝑗=1

𝑖

ln 𝑇𝑗


𝑁2 live points


𝑗=1

𝑖

ln 𝑇𝑗

𝐾𝑖+1 ≥ 𝐾𝑖:𝑇𝑖+1 ∼ Beta(𝐾𝑖+1, 1)

Constant/Increasingℒ𝑁2

2> ⋯ > ℒ𝑁1

1… > ℒ1

1

> ℒ22

> ℒ12

> 0


ℒ𝑁2

2> ⋯ > ℒ𝑁1

1… > ℒ1

1

> ℒ22

> ℒ12

> 0

𝑁2 live points


𝑗=1

𝑖

ln 𝑇𝑗

𝐾𝑖+1 ≥ 𝐾𝑖:𝑇𝑖+1 ∼ Beta(𝐾𝑖+1, 1)

𝐾𝑖+𝑗>𝑖 < 𝐾𝑖:

𝑇𝑖+1, … 𝑇𝑖+𝑗 ∼

𝑈 𝐾𝑖, … , 𝑈 𝐾𝑖−𝑗+1

Constant/Increasing

Decreasing sequence

Static Nested Sampling

Exponential

shrinkage

Uniform

shrinkage


𝑖=1

𝑁

ln𝐾

𝐾 + 1+

𝑗=1

𝑘

ln𝐾 − 𝑗 + 1

𝐾 − 𝑗 + 2


ln 𝔼 𝑋𝑁 =

𝑖=1

𝑛1

ln𝐾𝑖

𝐾𝑖 + 1+

𝑗=1

𝑛2

ln𝐾𝑛1

− 𝑗 + 1

𝐾𝑛1− 𝑗 + 2

+ ⋯ +

𝑗=1

𝐾final

ln𝐾final − 𝑗 + 1

𝐾final − 𝑗 + 2

Exponential

shrinkage

Uniform

shrinkage

Benefits of Dynamic Nested Sampling

• Can accommodate new “strands” within a

particular range of prior volumes without

changing overall statistical framework.

• Particles can be adaptively added until stopping

criteria are reached, allowing targeted

estimation.

Allocating Samples


arxiv:1704.03459

𝐼𝒵 𝑖 ∼𝔼 𝒵>𝑖

𝐾𝑖

𝐼𝑃 𝑖 ∼ 𝑝𝑖

𝐼 𝐺, 𝑖 = 𝐺𝐼𝑃 𝑖

𝑗 𝐼𝑃 𝑗+ 1 − 𝐺

𝐼𝒵 𝑖

𝑗 𝐼𝒵 𝑗

Posterior weight Evidence weight

Weight Function


Allocating Samples


arxiv:1704.03459


Allocating Samples

3-D correlated multivariate Normal

Static Dynamic

(posterior)

Dynamic

(evidence)

How Many Samples is Enough?


• Ill-posed question: depends on application!


• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?



Assume general case: we want D-dimensional

𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.

“True” posterior

constructed over

same “domain”.





𝐻 𝑝 𝑝 ≡ Ω𝚯

𝑝 𝚯 ln 𝑝 𝚯

𝑝 𝚯𝑑𝚯





𝐻 𝑝 𝑝 =

𝑖=1

𝑁

𝑝𝑖 ln 𝑝𝑖 𝑝𝑖





𝐻 𝑝 𝑝 =

𝑖=1

𝑁

𝑝𝑖 ln 𝑝𝑖 𝑝𝑖

We want access to P 𝑝𝑖 𝑝 , but we don’t know 𝑝 𝚯 .





𝐻 𝑝 𝑝′ =

𝑖=1

𝑁

𝑝′𝑖ln 𝑝′

𝑖 𝑝𝑖

We want access to Pr 𝑝𝑖 𝑝 𝚯 , but we don’t know 𝑝 𝚯 .

Use bootstrap estimator.





𝐻 𝑝 𝑝′ =

𝑖=1

𝑁

𝑝′𝑖ln 𝑝′

𝑖 𝑝𝑖



Random

variable





Possible stopping criterion: fractional (%) variation in H.



Dynamic Nested Sampling Summary1. Can sample from multi-modal distributions.

2. Can simultaneously estimate the evidence 𝒵 and

posterior 𝑝 𝚯 .

3. Combining independent runs improves inference

(“trivially parallelizable”).

4. Can simulate uncertainties (sampling and statistical) from

a single run.

5. Enables adaptive sample allocation during runtime using

arbitrary weight functions.

6. Possesses evidence/posterior-based stopping criteria.

Examples andApplications

Dynamic Nested Sampling with dynesty

dynesty.readthedocs.io

• Pure Python.

• Easy to use.

• Modular.

• Open source.

• Parallelizable.

• Flexible bounding/sampling methods.

• Thorough documentation!

http://dynesty.readthedocs.io/

Example:

Example: Linear Regression (Posterior)

𝚯 = {𝑎, 𝑏, ln 𝑓}


Corner Plot


Corner Plot

Trace Plot

Example: Multivariate Normal (Evidence)

𝚯 = 𝑥1, 𝑥2, 𝑥3

Example: Multivariate Normal (Evidence)

“Summary” Plot

Example: Multivariate Normal (Errors)

StaticDynamic

“Summary” Plot

Application:

Application:

• All results are preliminary but agree with results

from MCMC methods (derived using emcee).

• Samples allocated with 100% posterior weight,

automated stopping criterion (2% fractional error

in simulated KLD).

• dynesty was substantially (~3-6x) more efficient

at generating good samples than emcee, before

thinning.

http://dfm.io/emcee/current/

Application: Modeling Galaxy SEDs

𝚯 = ln 𝑀∗ , ln 𝑍 , 𝝈5, 𝜹6, 𝜶2 D=15

With: Joel Leja, Ben Johnson, Charlie Conroy


Fig: Joel Leja


Application: Supernovae Light Curves

Fig: Open Supernova Catalog (LSQ12dlf), James Guillochon

With: James Guillochon, Kaisey Mandel

𝚯 = 𝑡𝑒 , 𝝆4, 𝝐3, 𝜹3, 𝜎2 D=12

https://sne.space/sne/LSQ12dlf/

Application: Supernovae Light Curves

With: James Guillochon, Kaisey Mandel

Application: Molecular Cloud Distances

With: Catherine Zucker, Doug Finkbeiner, Alyssa Goodman

𝚯 = 𝒇2, 𝒅5, 𝒄5, 𝑝𝑜 D=13

Application: Molecular Cloud Distances

With: Catherine Zucker, Doug Finkbeiner, Alyssa Goodman

Dynamic Nested Sampling with dynesty

dynesty.readthedocs.io

• Pure Python.

• Easy to use.

• Modular.

• Open source.

• Parallelizable.

• Flexible bounding/sampling methods.

• Thorough documentation!

http://dynesty.readthedocs.io/

an introduction to nested samplinghea-2017/09/26 · an introduction to (dynamic) nested sampling...

Documents