an introduction to nested samplinghea-2017/09/26 · an introduction to (dynamic) nested sampling...

An Introduction to (Dynamic) Nested Sampling

Josh Speagle

jspeagle@cfa.harvard.edu

Introduction

Background

Pr 𝚯 𝐃, M =Pr 𝐃 𝚯, M Pr 𝚯|M

Pr 𝐃 M

Bayes’ Theorem

Background

Pr 𝚯 𝐃, M =Pr 𝐃 𝚯, M Pr 𝚯|M

Pr 𝐃 M

Bayes’ Theorem

PriorLikelihoodPosterior

Evidence

Background

𝑝 𝚯 =ℒ 𝚯 𝜋 𝚯

Bayes’ Theorem

≡ Ω𝚯

ℒ 𝚯 𝜋 𝚯 𝑑𝚯

PosteriorLikelihood Prior

Evidence

Background

𝑝 𝚯 =ℒ 𝚯 𝜋 𝚯

Bayes’ Theorem

≡ Ω𝚯

PosteriorLikelihood Prior

Evidence

Posterior Estimation via Sampling

𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯Samples

𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯

𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯

𝑖=1

𝑝𝑖𝛿𝐷 𝚯 − 𝚯i

Samples

Weights

𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯

𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯

𝑖=1

Samples

Weights

1, 1, … , 1, 1MCMC

𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯

𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯

𝑖=1

Samples

Weights

1, 1, … , 1, 1MCMC𝑝𝑁𝑞𝑁

, 𝑝𝑁−1𝑞𝑁−1

, … , 𝑝2𝑞2

, 𝑝1𝑞1

Importance

Sampling

𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯

𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯

𝑖=1

Samples

Weights

Nested Sampling?

Motivation: Integrating the Posterior

Ω𝚯

𝑝 𝚯 𝑑𝚯

{𝚯: 𝑝 𝚯 =𝜆}

𝜆𝑑𝑉 𝜆

𝑝 𝑉𝑖 = 𝜆𝑖

𝑑𝑉𝑖

𝑝 𝑉 𝑑𝑉 𝜆

𝑑𝑉𝑖

𝑝 𝑉 𝑑𝑉 𝜆“Amplitude”

“Volume”

𝑑𝑉𝑖

“Typical Set”

𝑑𝑉 ∝ 𝑟𝐷−1

“Typical Set”

Ω𝚯

𝑝 𝚯 𝑑𝚯

𝒵 ≡ Ω𝚯

𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝒵 = 0

𝑋 𝜆 𝑑𝜆

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝒵 = 0

ℒ 𝑋 𝑑𝑋

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝒵 ≈

𝑖=1

𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , … can be

rectangles, trapezoids, etc.

Amplitude

Differential

Volume Δ𝑋𝑖

𝒵 ≈

𝑖=1

𝑤𝑖

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

𝑝𝑖 = 𝑤𝑖

We get posteriors “for free”

𝒵 ≈

𝑖=1

𝑤𝑖

Importance Weight

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

𝒵 ≈

𝑖=1

𝑤𝑖

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

Directly proportional to

typical set. 𝑝𝑖 =

𝑤𝑖

We get posteriors “for free”

Importance Weight

~ 𝑓 𝑝𝑖 , … 𝑔 𝑉𝑖 , …

Motivation: Sampling the Posterior

Sampling directly from the

likelihood ℒ 𝚯 is hard.

Pictures from this 2010 talk by Skilling.

Sampling uniformly within

bound ℒ 𝚯 > 𝜆 is easier.

𝑋𝑖−1

𝑋𝑖𝑋𝑖−1

𝑋𝑖+1𝑋𝑖

MCMC: Solving a Hard Problem once.

Nested Sampling: Solving an Easier

Problem many times.

How Nested Sampling Works

Estimating the Prior Volume

𝒵 ≈

𝑖=1

𝑤𝑖

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

𝒵 ≈

𝑖=1

𝑤𝑖

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

𝒵 ≈

𝑖=1

𝑤𝑖

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

𝒵 ≈

𝑖=1

𝑤𝑖

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

𝑋~𝑓 𝑥𝐹 𝑋 ~ Unif

Probability Integral

Transform

𝒵 ≈

𝑖=1

𝑤𝑖

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

𝑋~𝑓 𝑥𝐹 𝑋 ~ Unif

Probability Integral

Transform Need to sample from the constrained prior.

𝑋𝑖+1𝑋𝑖

Posterior

𝑋𝑖+1𝑋𝑖

𝑋𝑖+1 = 𝑈𝑖𝑋𝑖

𝑈𝑖 ~ Unif

Posterior

𝑋𝑖+1𝑋𝑖

𝑋𝑖+1 =

𝑗=0

𝑈𝑗 𝑋0

𝑈0, … , 𝑈𝑖~ Unifi. i. d.

Posterior

𝑋𝑖+1𝑋𝑖

𝑋𝑖+1 =

𝑗=0

𝑈𝑗 𝑋0

𝑈0, … , 𝑈𝑖~ Unif

𝑋0 ≡ 1

i. i. d.

Posterior

𝑋𝑖+1𝑋𝑖

𝑋𝑖+1 =

𝑗=0

𝑈𝑗

𝑈0, … , 𝑈𝑖~ Unifi. i. d.

𝑋0 ≡ 1

Posterior

𝒵 ≈

𝑖=1

𝑤𝑖

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

𝒵 ≈

𝑖=1

𝑤𝑖

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

Pr 𝑋1, 𝑋2, … , 𝑋𝑖where 1st & 2nd moments

can be computed

Nested Sampling Algorithm

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0𝚯1𝚯2𝚯𝑁−1𝚯𝑁

Nested Sampling Algorithm (Ideal)

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

Samples sequentially drawn

from constrained prior 𝜋ℒ>ℒ𝑖𝚯 .

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0𝚯1𝚯2𝚯𝑁−1𝚯𝑁

Nested Sampling Algorithm (Naïve)

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

𝚯1 ~ 𝜋 𝚯

1. Samples sequentially drawn from prior 𝜋 𝚯 .

2. New point only accepted if ℒ𝑖+1 > ℒ𝑖.

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

𝚯1 ~ 𝜋 𝚯𝚯2

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

𝚯1 ~ 𝜋 𝚯𝚯2𝚯𝑁−1𝚯𝑁

Building a NestedSampling Algorithm

Components of the Algorithm

1. Adding more particles.

2. Knowing when to stop.

3. What to do after stopping.

Adding More Particles

ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

Skilling (2006)

ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

Skilling (2006)

𝑋𝑖+11

𝑗=0

𝑈𝑗

ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

ℒ𝑁2

2> ⋯ > ℒ2

2> ℒ1

Skilling (2006)

ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

ℒ𝑁2

2> ⋯ > ℒ2

2> ℒ1

Skilling (2006)

𝑋𝑖+11

𝑗=0

𝑈𝑗 𝑋𝑗+12

𝑘=0

𝑈𝑘

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

> ℒ22

> ℒ12

> ℒ11

Skilling (2006)

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

> ℒ22

> ℒ12

> ℒ11

Skilling (2006)

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

> ℒ22

> ℒ12

> ℒ11

Skilling (2006)

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

> ℒ22

> ℒ12

> ℒ11

Skilling (2006)

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

> ℒ22

> ℒ12

> ℒ11

Skilling (2006)

ln 𝑋𝑖+1 =

𝑗=0

ln 𝑈𝑗

𝑈0, … , 𝑈𝑖~ Unif

Skilling (2006)

ln 𝑋𝑖+1 =

𝑗=0

ln 𝑇𝑗

𝑇0, … , 𝑇𝑖~ 𝑈 2

𝑈1, 𝑈2 ~ Unif ⇒ 𝑈 1 , 𝑈 2

Order Statistics

Skilling (2006)

ln 𝑋𝑖+1 =

𝑗=0

ln 𝑇𝑗

𝑇0, … , 𝑇𝑖~ Beta 2,1

Skilling (2006)

𝑈1, 𝑈2 ~ Unif ⇒ 𝑈 1 , 𝑈 2

Order Statistics

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

> ℒ22

> ℒ12

> ℒ11

Skilling (2006)

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

> ℒ22

> ℒ12

> ℒ11

One run with 2 “live points”

= 2 runs with 1 live point

Skilling (2006)

live points

dead points

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

> ℒ22

> ℒ12

> ℒ11

One run with K “live points”

= K runs with 1 live point

Skilling (2006)

live points

dead points

ln 𝔼 𝑋𝑖+1 =

𝑗=0

ln 𝔼 𝑇𝑗

𝑇0, … , 𝑇𝑖~ Beta 2,1

Skilling (2006)

𝑇1, 𝑇2 ~ Unif ⇒ 𝑈 1 , 𝑈 2

Order Statisticslive points

dead points

𝑇0, … , 𝑇𝑖~ Beta K, 1

Skilling (2006)

𝑈1, … , 𝑈𝐾 ~ Unif ⇒ 𝑈 1 , … , 𝑈 𝐾

Order Statistics

𝑗=0

ln 𝔼 𝑇𝑗

𝑇0, … , 𝑇𝑖~ Beta K, 1

Skilling (2006)

𝑗=0

ln𝐾

𝐾 + 1

𝑈1, … , 𝑈𝐾 ~ Unif ⇒ 𝑈 1 , … , 𝑈 𝐾

Order Statistics

Stopping Criteria

𝒵 = 𝒵𝑖 + 𝒵in

Stopping Criteria

𝒵𝑖

𝒵𝑖 + 𝒵in

> 𝑓

Stopping Criteria

𝒵 =

𝑗=1

𝑤𝑗 + 𝒵in

Stopping Criteria

𝒵 ≤

𝑗=1

𝑤𝑗 + ℒ K 𝑋𝑛

Uniform slab

Maximum

Likelihood

Stopping Criteria

𝒵 ≲

𝑗=1

𝑤𝑗 + ℒ K 𝑋𝑛

Uniform slab

Maximum

Likelihood

“Recycling” the Final Set of Particles

… , 𝜒𝑁+1𝑘

, … ~ 𝑈𝑘𝑋𝑁

𝑈1, … , 𝑈𝐾 ∼ Unif

… , 𝜒𝑁+1𝑘

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

… , 𝜒𝑁+1𝑘

ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

… , 𝜒𝑁+1𝑘

ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

𝑋𝑁+𝑘 = 𝜒𝑁+1𝐾−𝑘+1

… , 𝑋𝑁+𝑘 , … ~ 𝑈 𝐾−𝑘+1 𝑋𝑁

ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

… , 𝑋𝑁+𝑘 , … ~ 𝑈 𝐾−𝑘+1 𝑋𝑁

ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

Rényi Representation

ln 𝔼 𝑋𝑁 =

𝑖=1

ln𝐾

𝐾 + 1

ln 𝔼 𝑋𝑁+𝑘 =

𝑖=1

ln𝐾

𝐾 + 1+ ln

𝐾 − 𝑘 + 1

𝐾 + 1

𝑖=1

ln𝐾

𝐾 + 1+

𝑗=1

ln𝐾 − 𝑗 + 1

𝐾 − 𝑗 + 2

𝑖=1

ln𝐾

𝐾 + 1+

𝑗=1

ln𝐾 − 𝑗 + 1

𝐾 − 𝑗 + 2

Exponential Shrinkage Uniform Shrinkage

Nested SamplingErrors

Nested Sampling Uncertainties

Statistical

uncertainties

• Unknown prior

volumes

Statistical

uncertaintiesSampling

uncertainties

• Unknown prior

volumes• Number of samples

(counting)

• Discrete point

estimates for

contours

• Particle path

dependencies

Sampling Error: Poisson Uncertainties

𝕍 𝑁 = 𝔼 𝑁 ∼? ? ?

𝔼 Δ ln 𝑋

Based on Skilling (2006)

and Keeton (2011)

“Distance” from prior to posterior

𝐻 𝜋 𝑝 ≡ Ω𝚯

𝑝 𝚯 ln𝑝 𝚯

𝜋 𝚯𝑑𝚯

and Keeton (2011)

Kullback-Leibler divergence from

𝜋 to 𝑝 “information gained”.

𝕍 𝑁 = 𝔼 𝑁 ∼𝐻 𝜋 𝑝

𝔼 Δ ln 𝑋

and Keeton (2011)

𝔼 Δ ln 𝑋

and Keeton (2011)

𝕍 ln 𝒵

𝔼 Δ ln 𝑋

and Keeton (2011)

𝕍 ln 𝒵 ∼ 𝕍 ln 𝑋𝐻

𝔼 Δ ln 𝑋

and Keeton (2011)

𝕍 ln 𝒵 ∼ 𝕍 ln 𝑋𝐻 ∼ 𝜎 𝑁 𝔼 Δ ln 𝑋 2

𝔼 Δ ln 𝑋

and Keeton (2011)

∼ 𝐻 𝜋 𝑝 𝔼 Δ ln 𝑋

𝔼 Δ ln 𝑋

and Keeton (2011)

𝑖=1

𝑁+𝐾

Δ𝐻𝑖𝔼 Δ ln 𝑋𝑖

= 1/𝐾

Sampling Error: Monte Carlo Noise

Formalism following Higson et al.

(2017) and Chopin and Robert (2010)

𝔼𝑝 𝑓 𝚯 = Ω𝚯

𝑓 𝚯 𝑝 𝚯 𝑑𝚯 =1

𝒵 0

𝑓 X ℒ 𝑋 𝑑𝑋

𝒵 0

𝑓 X = 𝔼𝜋 𝑓 𝚯 ℒ 𝚯 = ℒ 𝑋

𝒵 0

𝑖=1

𝑁+𝐾

𝑝𝑖 𝑓 𝚯𝑖

𝒵 0

Exploring Sampling Uncertainties

= K runs with 1 live point!

ℒ𝑁 > ℒ𝑁−1 > ⋯> ℒ2 > ℒ1 > 0

ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

ℒ𝑁2

2> ⋯ > ℒ2

2> ℒ1

ℒ 𝑖∙

= ℒ 𝑖1

, ℒ 𝑖2

Original run

“strand”

ℒ 𝑖∙

= ℒ 𝑖1

, ℒ 𝑖2

Original run

“strand”

ℒ 𝑖∙

= ℒ 𝑖1

, ℒ 𝑖2

Original run

“strand”

ℒ 𝑖∙

′= ℒ 𝑖

1′, ℒ 𝑖

2′, …

We would like to sample K paths from

the set of all possible paths P ℒ 𝑖 , … .

However, we don’t have access to it.

ℒ 𝑖∙

= ℒ 𝑖1

, ℒ 𝑖2

Original run

“strand”

However, we don’t have access to it.Use bootstrap estimator.

ℒ 𝑖

∙′= ℒ 𝑖

1, ℒ 𝑖

2, …

ℒ 𝑖∙

= ℒ 𝑖1

, ℒ 𝑖2

Original run

“strand”

𝒵′ ≈

𝑖=1

𝑁′

𝑤𝑖′ 𝑝𝑖

′ = 𝑤𝑖

𝒵′

However, we don’t have access to it.Use bootstrap estimator.

Nested SamplingIn Practice

Nested Sampling In Practice

Higson et al. (2017)

arxiv:1704.03459

Method 0: Sampling from the Prior

arxiv:1704.03459

Method 0: Sampling from the Prior

arxiv:1704.03459

Sampling from the prior

becomes exponentially

more inefficient as time

goes on.

Method 1: Constrained Uniform Sampling

Feroz et al. (2009)

Proposal:

Bound the iso-likelihood contours in real

time and sample from the newly

constrained prior.

Feroz et al. (2009)

Issues:

• How to ensure bounds always

encompass iso-likelihood contours?

• How to generate flexible bounds?

Feroz et al. (2009)

Issues:

• How to ensure bounds always

encompass iso-likelihood contours?

• How to generate flexible bounds?

Bootstrapping.

Easier with uniform

(transformed) prior.

Feroz et al. (2009)

Ellipsoids Balls/Cubes

Buchner (2014)

MultiNest

Method 2: “Evolving” Previous Samples

Proposal:

Generate independent samples subject to the likelihood

constraint by “evolving” copies of current live points.

Proposal:

• Random walks (i.e. MCMC)

• Slice sampling

• Random trajectories (i.e. HMC)

PolyChord

Method 2: “Evolving” Previous SamplesIssues:

• How to ensure samples are independent (thinning) and

properly distributed within likelihood constraint?

• How to generate efficient proposals?

• Random walks (i.e. MCMC)

• Slice sampling

• Random trajectories (i.e. HMC)

PolyChord

Example: Gaussian Shells

Feroz et al. (2013)

Example: Eggbox

Feroz et al. (2013)

Summary: (Static) Nested Sampling

1. Estimates the evidence 𝒵.

2. Estimates the posterior 𝑝 𝚯 .

3. Possesses well-defined stopping criteria.

4. Combining runs improves inference.

5. Sampling and statistical

uncertainties can be simulated

from a single run.

Dynamic NestedSampling

Dynamic Nested Sampling

arxiv:1704.03459

ℒmax1

> ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

1> ℒmin

ℒ𝑁2

2> ⋯ > ℒ2

2> ℒ1

ℒ𝑁2

2> ⋯ > ℒ𝑁1

1… > ℒ1

> ℒ22

> ℒ12

𝑁2 live points

𝑁2 live points 𝑁1 + 𝑁2 live points

ℒ𝑁2

2> ⋯ > ℒ𝑁1

1… > ℒ1

> ℒ22

> ℒ12

𝑁2 live points

𝑁2 live points 𝑁1 + 𝑁2 live pointsln 𝑋𝑖+1 =

𝑗=1

ln 𝑇𝑗

𝑁2 live points

𝑗=1

ln 𝑇𝑗

𝐾𝑖+1 ≥ 𝐾𝑖:𝑇𝑖+1 ∼ Beta(𝐾𝑖+1, 1)

Constant/Increasingℒ𝑁2

2> ⋯ > ℒ𝑁1

1… > ℒ1

> ℒ22

> ℒ12

ℒ𝑁2

2> ⋯ > ℒ𝑁1

1… > ℒ1

> ℒ22

> ℒ12

𝑁2 live points

𝑗=1

ln 𝑇𝑗

𝐾𝑖+1 ≥ 𝐾𝑖:𝑇𝑖+1 ∼ Beta(𝐾𝑖+1, 1)

𝐾𝑖+𝑗>𝑖 < 𝐾𝑖:

𝑇𝑖+1, … 𝑇𝑖+𝑗 ∼

𝑈 𝐾𝑖, … , 𝑈 𝐾𝑖−𝑗+1

Constant/Increasing

Decreasing sequence

Static Nested Sampling

Exponential

shrinkage

Uniform

shrinkage

𝑖=1

ln𝐾

𝐾 + 1+

𝑗=1

ln𝐾 − 𝑗 + 1

𝐾 − 𝑗 + 2

ln 𝔼 𝑋𝑁 =

𝑖=1

ln𝐾𝑖

𝐾𝑖 + 1+

𝑗=1

ln𝐾𝑛1

− 𝑗 + 1

𝐾𝑛1− 𝑗 + 2

+ ⋯ +

𝑗=1

𝐾final

ln𝐾final − 𝑗 + 1

𝐾final − 𝑗 + 2

Exponential

shrinkage

Uniform

shrinkage

Benefits of Dynamic Nested Sampling

• Can accommodate new “strands” within a

particular range of prior volumes without

changing overall statistical framework.

• Particles can be adaptively added until stopping

criteria are reached, allowing targeted

estimation.

Benefits of Dynamic Nested Sampling

• Can accommodate new “strands” within a

particular range of prior volumes without

changing overall statistical framework.

• Particles can be adaptively added until stopping

criteria are reached, allowing targeted

estimation.

Allocating Samples

arxiv:1704.03459

𝐼𝒵 𝑖 ∼𝔼 𝒵>𝑖

𝐾𝑖

𝐼𝑃 𝑖 ∼ 𝑝𝑖

𝐼 𝐺, 𝑖 = 𝐺𝐼𝑃 𝑖

𝑗 𝐼𝑃 𝑗+ 1 − 𝐺

𝐼𝒵 𝑖

𝑗 𝐼𝒵 𝑗

Posterior weight Evidence weight

Weight Function

Allocating Samples

arxiv:1704.03459

Allocating Samples

3-D correlated multivariate Normal

Static Dynamic

(posterior)

Dynamic

(evidence)

How Many Samples is Enough?

• Ill-posed question: depends on application!

• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?

Assume general case: we want D-dimensional

𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.

“True” posterior

constructed over

same “domain”.

𝐻 𝑝 𝑝 ≡ Ω𝚯

𝑝 𝚯 ln 𝑝 𝚯

𝑝 𝚯𝑑𝚯

𝐻 𝑝 𝑝 =

𝑖=1

𝑝𝑖 ln 𝑝𝑖 𝑝𝑖

𝐻 𝑝 𝑝 =

𝑖=1

𝑝𝑖 ln 𝑝𝑖 𝑝𝑖

We want access to P 𝑝𝑖 𝑝 , but we don’t know 𝑝 𝚯 .

𝐻 𝑝 𝑝′ =

𝑖=1

𝑝′𝑖ln 𝑝′

𝑖 𝑝𝑖

We want access to Pr 𝑝𝑖 𝑝 𝚯 , but we don’t know 𝑝 𝚯 .

Use bootstrap estimator.

𝐻 𝑝 𝑝′ =

𝑖=1

𝑝′𝑖ln 𝑝′

𝑖 𝑝𝑖

Random

variable

Possible stopping criterion: fractional (%) variation in H.

Dynamic Nested Sampling Summary1. Can sample from multi-modal distributions.

2. Can simultaneously estimate the evidence 𝒵 and

posterior 𝑝 𝚯 .

3. Combining independent runs improves inference

(“trivially parallelizable”).

4. Can simulate uncertainties (sampling and statistical) from

a single run.

5. Enables adaptive sample allocation during runtime using

arbitrary weight functions.

6. Possesses evidence/posterior-based stopping criteria.

Examples andApplications

Dynamic Nested Sampling with dynesty

dynesty.readthedocs.io

• Pure Python.

• Easy to use.

• Modular.

• Open source.

• Parallelizable.

• Flexible bounding/sampling methods.

• Thorough documentation!

Example:

Example: Linear Regression (Posterior)

𝚯 = {𝑎, 𝑏, ln 𝑓}

Corner Plot

Trace Plot

Example: Multivariate Normal (Evidence)

𝚯 = 𝑥1, 𝑥2, 𝑥3

Example: Multivariate Normal (Evidence)

“Summary” Plot

Example: Multivariate Normal (Errors)

StaticDynamic

“Summary” Plot

Application:

• All results are preliminary but agree with results

from MCMC methods (derived using emcee).

• Samples allocated with 100% posterior weight,

automated stopping criterion (2% fractional error

in simulated KLD).

• dynesty was substantially (~3-6x) more efficient

at generating good samples than emcee, before

thinning.

Application: Modeling Galaxy SEDs

𝚯 = ln 𝑀∗ , ln 𝑍 , 𝝈5, 𝜹6, 𝜶2 D=15

With: Joel Leja, Ben Johnson, Charlie Conroy

Fig: Joel Leja

Application: Supernovae Light Curves

Fig: Open Supernova Catalog (LSQ12dlf), James Guillochon

With: James Guillochon, Kaisey Mandel

𝚯 = 𝑡𝑒 , 𝝆4, 𝝐3, 𝜹3, 𝜎2 D=12

Application: Supernovae Light Curves

With: James Guillochon, Kaisey Mandel

Application: Molecular Cloud Distances

With: Catherine Zucker, Doug Finkbeiner, Alyssa Goodman

𝚯 = 𝒇2, 𝒅5, 𝒄5, 𝑝𝑜 D=13

Application: Molecular Cloud Distances

With: Catherine Zucker, Doug Finkbeiner, Alyssa Goodman

Dynamic Nested Sampling with dynesty

dynesty.readthedocs.io

• Pure Python.

• Easy to use.

• Modular.

• Open source.

• Parallelizable.

• Flexible bounding/sampling methods.

• Thorough documentation!

an introduction to nested samplinghea-2017/09/26 · an introduction to (dynamic) nested sampling...

Documents

1 flow of control true and false in c conditional execution...

python - tops technologies · module 1 [core python...

algorithms design for the parallelization of nested...

aggregate functions and nested queries -...

nested designs

introduction to windows server 2016 nested virtualization

08. rancangan nested

scalable extensibility via nested...

introduction,function templates, class templates, nested...

an introduction to python - nested branches, multiple...

introduction to nested (hierarchical) anova handouts/nested...

study on nested-structured load shedding method of thermal...

exponential stability for hybrid systems with nested...

nested interrogatives and the locus of wh 1. introduction

use of nested designs in diallel cross experiments · use...

nested radicals

nested sampling

nested dictionaries

nested oppositions

nested loops