an introduction to nested samplinghea-2017/09/26  · an introduction to (dynamic) nested sampling...

Post on 04-Sep-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An Introduction to (Dynamic) Nested Sampling

Josh Speagle

jspeagle@cfa.harvard.edu

Introduction

Background

Pr 𝚯 𝐃, M =Pr 𝐃 𝚯, M Pr 𝚯|M

Pr 𝐃 M

Bayes’ Theorem

Background

Pr 𝚯 𝐃, M =Pr 𝐃 𝚯, M Pr 𝚯|M

Pr 𝐃 M

Bayes’ Theorem

PriorLikelihoodPosterior

Evidence

Background

𝑝 𝚯 =ℒ 𝚯 𝜋 𝚯

𝒵

Bayes’ Theorem

≡ Ω𝚯

ℒ 𝚯 𝜋 𝚯 𝑑𝚯

PosteriorLikelihood Prior

Evidence

Background

𝑝 𝚯 =ℒ 𝚯 𝜋 𝚯

𝒵

Bayes’ Theorem

≡ Ω𝚯

ℒ 𝚯 𝜋 𝚯 𝑑𝚯

PosteriorLikelihood Prior

Evidence

Posterior Estimation via Sampling

𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯Samples

Posterior Estimation via Sampling

𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯

𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯

=

𝑖=1

𝑁

𝑝𝑖𝛿𝐷 𝚯 − 𝚯i

Samples

Weights

Posterior Estimation via Sampling

𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯

𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯

=

𝑖=1

𝑁

𝑝𝑖𝛿𝐷 𝚯 − 𝚯i

Samples

Weights

1, 1, … , 1, 1MCMC

Posterior Estimation via Sampling

𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯

𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯

=

𝑖=1

𝑁

𝑝𝑖𝛿𝐷 𝚯 − 𝚯i

Samples

Weights

1, 1, … , 1, 1MCMC𝑝𝑁𝑞𝑁

, 𝑝𝑁−1𝑞𝑁−1

, … , 𝑝2𝑞2

, 𝑝1𝑞1

Importance

Sampling

Posterior Estimation via Sampling

𝚯𝑁, 𝚯𝑁−1, … , 𝚯2, 𝚯1 ∈ Ω𝚯

𝑝𝑁, 𝑝𝑁−1, … , 𝑝2, 𝑝1 ⇒ 𝑝 𝚯

=

𝑖=1

𝑁

𝑝𝑖𝛿𝐷 𝚯 − 𝚯i

Samples

Weights

Nested Sampling?

Motivation: Integrating the Posterior

Ω𝚯

𝑝 𝚯 𝑑𝚯

Motivation: Integrating the Posterior

{𝚯: 𝑝 𝚯 =𝜆}

𝜆𝑑𝑉 𝜆

𝑝 𝑉𝑖 = 𝜆𝑖

𝑑𝑉𝑖

Motivation: Integrating the Posterior

0

𝑝 𝑉 𝑑𝑉 𝜆

𝑝 𝑉𝑖 = 𝜆𝑖

𝑑𝑉𝑖

Motivation: Integrating the Posterior

0

𝑝 𝑉 𝑑𝑉 𝜆“Amplitude”

“Volume”

𝑝 𝑉𝑖 = 𝜆𝑖

𝑑𝑉𝑖

Motivation: Integrating the Posterior

0

𝑝 𝑉 𝑑𝑉 𝜆

𝑝 𝑉𝑖 = 𝜆𝑖

𝑑𝑉𝑖

“Typical Set”

Motivation: Integrating the Posterior

0

𝑝 𝑉 𝑑𝑉 𝜆

𝑑𝑉 ∝ 𝑟𝐷−1

“Typical Set”

Motivation: Integrating the Posterior

Ω𝚯

𝑝 𝚯 𝑑𝚯

Motivation: Integrating the Posterior

𝒵 ≡ Ω𝚯

ℒ 𝚯 𝜋 𝚯 𝑑𝚯

Motivation: Integrating the Posterior

𝒵 ≡ Ω𝚯

ℒ 𝚯 𝜋 𝚯 𝑑𝚯

𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

Motivation: Integrating the Posterior

𝒵 = 0

𝑋 𝜆 𝑑𝜆

𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

Motivation: Integrating the Posterior

𝒵 = 0

1

ℒ 𝑋 𝑑𝑋

𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

Motivation: Integrating the Posterior

𝒵 ≈

𝑖=1

𝑛

𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , … can be

rectangles, trapezoids, etc.

Amplitude

Differential

Volume Δ𝑋𝑖

Motivation: Integrating the Posterior

𝒵 ≈

𝑖=1

𝑛

𝑤𝑖

𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

Motivation: Integrating the Posterior

𝑝𝑖 = 𝑤𝑖

𝒵

We get posteriors “for free”

𝒵 ≈

𝑖=1

𝑛

𝑤𝑖

Importance Weight

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

Motivation: Integrating the Posterior

𝒵 ≈

𝑖=1

𝑛

𝑤𝑖

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

Directly proportional to

typical set. 𝑝𝑖 =

𝑤𝑖

𝒵

We get posteriors “for free”

Importance Weight

~ 𝑓 𝑝𝑖 , … 𝑔 𝑉𝑖 , …

Motivation: Sampling the Posterior

Sampling directly from the

likelihood ℒ 𝚯 is hard.

Pictures from this 2010 talk by Skilling.

Motivation: Sampling the Posterior

Sampling uniformly within

bound ℒ 𝚯 > 𝜆 is easier.

Pictures from this 2010 talk by Skilling.

Motivation: Sampling the Posterior

Sampling uniformly within

bound ℒ 𝚯 > 𝜆 is easier.

Pictures from this 2010 talk by Skilling.

𝑋𝑖−1

Motivation: Sampling the Posterior

Sampling uniformly within

bound ℒ 𝚯 > 𝜆 is easier.

Pictures from this 2010 talk by Skilling.

𝑋𝑖−1

Motivation: Sampling the Posterior

Sampling uniformly within

bound ℒ 𝚯 > 𝜆 is easier.

Pictures from this 2010 talk by Skilling.

𝑋𝑖𝑋𝑖−1

Motivation: Sampling the Posterior

Sampling uniformly within

bound ℒ 𝚯 > 𝜆 is easier.

Pictures from this 2010 talk by Skilling.

𝑋𝑖+1𝑋𝑖

Motivation: Sampling the Posterior

Sampling uniformly within

bound ℒ 𝚯 > 𝜆 is easier.

Pictures from this 2010 talk by Skilling.

𝑋𝑖+1𝑋𝑖

MCMC: Solving a Hard Problem once.

vs

Nested Sampling: Solving an Easier

Problem many times.

How Nested Sampling Works

Estimating the Prior Volume

𝒵 ≈

𝑖=1

𝑛

𝑤𝑖

𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

Estimating the Prior Volume

𝒵 ≈

𝑖=1

𝑛

𝑤𝑖

𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

Estimating the Prior Volume

𝒵 ≈

𝑖=1

𝑛

𝑤𝑖

𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

???

Estimating the Prior Volume

𝒵 ≈

𝑖=1

𝑛

𝑤𝑖

𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

???

𝑋~𝑓 𝑥𝐹 𝑋 ~ Unif

PDF

CDF

Probability Integral

Transform

Estimating the Prior Volume

𝒵 ≈

𝑖=1

𝑛

𝑤𝑖

𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

???

𝑋~𝑓 𝑥𝐹 𝑋 ~ Unif

PDF

CDF

Probability Integral

Transform Need to sample from the constrained prior.

Estimating the Prior Volume

Pictures from this 2010 talk by Skilling.

𝑋𝑖+1𝑋𝑖

Posterior

Estimating the Prior Volume

Pictures from this 2010 talk by Skilling.

𝑋𝑖+1𝑋𝑖

𝑋𝑖+1 = 𝑈𝑖𝑋𝑖

𝑈𝑖 ~ Unif

Posterior

Pictures from this 2010 talk by Skilling.

𝑋𝑖+1𝑋𝑖

𝑋𝑖+1 =

𝑗=0

𝑖

𝑈𝑗 𝑋0

𝑈0, … , 𝑈𝑖~ Unifi. i. d.

Estimating the Prior Volume

Posterior

Pictures from this 2010 talk by Skilling.

𝑋𝑖+1𝑋𝑖

𝑋𝑖+1 =

𝑗=0

𝑖

𝑈𝑗 𝑋0

𝑈0, … , 𝑈𝑖~ Unif

𝑋0 ≡ 1

i. i. d.

Estimating the Prior Volume

Posterior

Pictures from this 2010 talk by Skilling.

𝑋𝑖+1𝑋𝑖

𝑋𝑖+1 =

𝑗=0

𝑖

𝑈𝑗

𝑈0, … , 𝑈𝑖~ Unifi. i. d.

𝑋0 ≡ 1

Estimating the Prior Volume

Posterior

Estimating the Prior Volume

𝒵 ≈

𝑖=1

𝑛

𝑤𝑖

𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

???

Estimating the Prior Volume

𝒵 ≈

𝑖=1

𝑛

𝑤𝑖

𝑋 𝜆 ≡ 𝚯:ℒ 𝚯 >𝜆

𝜋 𝚯 𝑑𝚯

“Prior Volume”

Feroz et al. (2013)

𝑤𝑖 = 𝑓 ℒ𝑖 , … 𝑔 𝑋𝑖 , …

Pr 𝑋1, 𝑋2, … , 𝑋𝑖where 1st & 2nd moments

can be computed

Nested Sampling Algorithm

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0𝚯1𝚯2𝚯𝑁−1𝚯𝑁

Nested Sampling Algorithm (Ideal)

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

Samples sequentially drawn

from constrained prior 𝜋ℒ>ℒ𝑖𝚯 .

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0𝚯1𝚯2𝚯𝑁−1𝚯𝑁

Nested Sampling Algorithm (Naïve)

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

𝚯1 ~ 𝜋 𝚯

1. Samples sequentially drawn from prior 𝜋 𝚯 .

2. New point only accepted if ℒ𝑖+1 > ℒ𝑖.

Nested Sampling Algorithm (Naïve)

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

𝚯1 ~ 𝜋 𝚯𝚯2

1. Samples sequentially drawn from prior 𝜋 𝚯 .

2. New point only accepted if ℒ𝑖+1 > ℒ𝑖.

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

Nested Sampling Algorithm (Naïve)

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

𝚯1 ~ 𝜋 𝚯𝚯2𝚯𝑁−1𝚯𝑁

1. Samples sequentially drawn from prior 𝜋 𝚯 .

2. New point only accepted if ℒ𝑖+1 > ℒ𝑖.

Building a NestedSampling Algorithm

Components of the Algorithm

1. Adding more particles.

2. Knowing when to stop.

3. What to do after stopping.

Adding More Particles

ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

1> 0

Skilling (2006)

Adding More Particles

ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

1> 0

Skilling (2006)

𝑋𝑖+11

=

𝑗=0

𝑖

𝑈𝑗

Adding More Particles

ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

1> 0

ℒ𝑁2

2> ⋯ > ℒ2

2> ℒ1

2> 0

Skilling (2006)

Adding More Particles

ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

1> 0

ℒ𝑁2

2> ⋯ > ℒ2

2> ℒ1

2> 0

Skilling (2006)

𝑋𝑖+11

=

𝑗=0

𝑖

𝑈𝑗 𝑋𝑗+12

=

𝑘=0

𝑗

𝑈𝑘

Adding More Particles

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

1

> ℒ22

> ℒ12

> ℒ11

> 0

Skilling (2006)

Adding More Particles

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

1

> ℒ22

> ℒ12

> ℒ11

> 0

Skilling (2006)

Adding More Particles

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

1

> ℒ22

> ℒ12

> ℒ11

> 0

Skilling (2006)

Adding More Particles

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

1

> ℒ22

> ℒ12

> ℒ11

> 0

Skilling (2006)

Adding More Particles

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

1

> ℒ22

> ℒ12

> ℒ11

> 0

Skilling (2006)

Adding More Particles

ln 𝑋𝑖+1 =

𝑗=0

𝑖

ln 𝑈𝑗

𝑈0, … , 𝑈𝑖~ Unif

Skilling (2006)

Adding More Particles

ln 𝑋𝑖+1 =

𝑗=0

𝑖

ln 𝑇𝑗

𝑇0, … , 𝑇𝑖~ 𝑈 2

𝑈1, 𝑈2 ~ Unif ⇒ 𝑈 1 , 𝑈 2

Order Statistics

Skilling (2006)

Adding More Particles

ln 𝑋𝑖+1 =

𝑗=0

𝑖

ln 𝑇𝑗

𝑇0, … , 𝑇𝑖~ Beta 2,1

Skilling (2006)

𝑈1, 𝑈2 ~ Unif ⇒ 𝑈 1 , 𝑈 2

Order Statistics

Adding More Particles

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

1

> ℒ22

> ℒ12

> ℒ11

> 0

Skilling (2006)

Adding More Particles

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

1

> ℒ22

> ℒ12

> ℒ11

> 0

One run with 2 “live points”

= 2 runs with 1 live point

Skilling (2006)

live points

dead points

Adding More Particles

ℒ𝑁1

1> ℒ𝑁2

2> ⋯ > ℒ2

1

> ℒ22

> ℒ12

> ℒ11

> 0

One run with K “live points”

= K runs with 1 live point

Skilling (2006)

live points

dead points

Adding More Particles

ln 𝔼 𝑋𝑖+1 =

𝑗=0

𝑖

ln 𝔼 𝑇𝑗

𝑇0, … , 𝑇𝑖~ Beta 2,1

Skilling (2006)

𝑇1, 𝑇2 ~ Unif ⇒ 𝑈 1 , 𝑈 2

Order Statisticslive points

dead points

Adding More Particles

𝑇0, … , 𝑇𝑖~ Beta K, 1

Skilling (2006)

𝑈1, … , 𝑈𝐾 ~ Unif ⇒ 𝑈 1 , … , 𝑈 𝐾

Order Statistics

ln 𝔼 𝑋𝑖+1 =

𝑗=0

𝑖

ln 𝔼 𝑇𝑗

Adding More Particles

𝑇0, … , 𝑇𝑖~ Beta K, 1

Skilling (2006)

ln 𝔼 𝑋𝑖+1 =

𝑗=0

𝑖

ln𝐾

𝐾 + 1

𝑈1, … , 𝑈𝐾 ~ Unif ⇒ 𝑈 1 , … , 𝑈 𝐾

Order Statistics

Stopping Criteria

𝒵 = 𝒵𝑖 + 𝒵in

Stopping Criteria

𝒵𝑖

𝒵𝑖 + 𝒵in

> 𝑓

Stopping Criteria

𝒵 =

𝑗=1

𝑖

𝑤𝑗 + 𝒵in

Stopping Criteria

𝒵 ≤

𝑗=1

𝑖

𝑤𝑗 + ℒ K 𝑋𝑛

Uniform slab

Maximum

Likelihood

Stopping Criteria

𝒵 ≲

𝑗=1

𝑖

𝑤𝑗 + ℒ K 𝑋𝑛

Uniform slab

Maximum

Likelihood

“Recycling” the Final Set of Particles

“Recycling” the Final Set of Particles

“Recycling” the Final Set of Particles

… , 𝜒𝑁+1𝑘

, … ~ 𝑈𝑘𝑋𝑁

𝑈1, … , 𝑈𝐾 ∼ Unif

“Recycling” the Final Set of Particles

… , 𝜒𝑁+1𝑘

, … ~ 𝑈𝑘𝑋𝑁

𝑈1, … , 𝑈𝐾 ∼ Unif

ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

“Recycling” the Final Set of Particles

… , 𝜒𝑁+1𝑘

, … ~ 𝑈𝑘𝑋𝑁

𝑈1, … , 𝑈𝐾 ∼ Unif

ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

“Recycling” the Final Set of Particles

… , 𝜒𝑁+1𝑘

, … ~ 𝑈𝑘𝑋𝑁

𝑈1, … , 𝑈𝐾 ∼ Unif

ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

𝑋𝑁+𝑘 = 𝜒𝑁+1𝐾−𝑘+1

“Recycling” the Final Set of Particles

… , 𝑋𝑁+𝑘 , … ~ 𝑈 𝐾−𝑘+1 𝑋𝑁

𝑈1, … , 𝑈𝐾 ∼ Unif

ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

𝑋𝑁+𝑘 = 𝜒𝑁+1𝐾−𝑘+1

“Recycling” the Final Set of Particles

… , 𝑋𝑁+𝑘 , … ~ 𝑈 𝐾−𝑘+1 𝑋𝑁

𝑈1, … , 𝑈𝐾 ∼ Unif

ℒ𝑁+𝐾 > ⋯ > ℒ𝑁+1 > ℒ𝑁 > ℒ𝑁−1 > ⋯ > ℒ2 > ℒ1 > 0

𝑋𝑁+𝑘 = 𝜒𝑁+1𝐾−𝑘+1

Rényi Representation

“Recycling” the Final Set of Particles

ln 𝔼 𝑋𝑁 =

𝑖=1

𝑁

ln𝐾

𝐾 + 1

“Recycling” the Final Set of Particles

ln 𝔼 𝑋𝑁+𝑘 =

𝑖=1

𝑁

ln𝐾

𝐾 + 1+ ln

𝐾 − 𝑘 + 1

𝐾 + 1

“Recycling” the Final Set of Particles

ln 𝔼 𝑋𝑁+𝑘 =

𝑖=1

𝑁

ln𝐾

𝐾 + 1+

𝑗=1

𝑘

ln𝐾 − 𝑗 + 1

𝐾 − 𝑗 + 2

“Recycling” the Final Set of Particles

ln 𝔼 𝑋𝑁+𝑘 =

𝑖=1

𝑁

ln𝐾

𝐾 + 1+

𝑗=1

𝑘

ln𝐾 − 𝑗 + 1

𝐾 − 𝑗 + 2

Exponential Shrinkage Uniform Shrinkage

Nested SamplingErrors

Nested Sampling Uncertainties

Pictures from this 2010 talk by Skilling.

Nested Sampling Uncertainties

Pictures from this 2010 talk by Skilling.

Nested Sampling Uncertainties

Pictures from this 2010 talk by Skilling.

Statistical

uncertainties

• Unknown prior

volumes

Nested Sampling Uncertainties

Pictures from this 2010 talk by Skilling.

Statistical

uncertaintiesSampling

uncertainties

• Unknown prior

volumes• Number of samples

(counting)

• Discrete point

estimates for

contours

• Particle path

dependencies

Sampling Error: Poisson Uncertainties

𝕍 𝑁 = 𝔼 𝑁 ∼? ? ?

𝔼 Δ ln 𝑋

Based on Skilling (2006)

and Keeton (2011)

“Distance” from prior to posterior

Sampling Error: Poisson Uncertainties

𝐻 𝜋 𝑝 ≡ Ω𝚯

𝑝 𝚯 ln𝑝 𝚯

𝜋 𝚯𝑑𝚯

Based on Skilling (2006)

and Keeton (2011)

Kullback-Leibler divergence from

𝜋 to 𝑝 “information gained”.

Sampling Error: Poisson Uncertainties

𝕍 𝑁 = 𝔼 𝑁 ∼𝐻 𝜋 𝑝

𝔼 Δ ln 𝑋

Based on Skilling (2006)

and Keeton (2011)

Sampling Error: Poisson Uncertainties

𝕍 𝑁 = 𝔼 𝑁 ∼𝐻 𝜋 𝑝

𝔼 Δ ln 𝑋

Based on Skilling (2006)

and Keeton (2011)

𝕍 ln 𝒵

Sampling Error: Poisson Uncertainties

𝕍 𝑁 = 𝔼 𝑁 ∼𝐻 𝜋 𝑝

𝔼 Δ ln 𝑋

Based on Skilling (2006)

and Keeton (2011)

𝕍 ln 𝒵 ∼ 𝕍 ln 𝑋𝐻

Sampling Error: Poisson Uncertainties

𝕍 𝑁 = 𝔼 𝑁 ∼𝐻 𝜋 𝑝

𝔼 Δ ln 𝑋

Based on Skilling (2006)

and Keeton (2011)

𝕍 ln 𝒵 ∼ 𝕍 ln 𝑋𝐻 ∼ 𝜎 𝑁 𝔼 Δ ln 𝑋 2

Sampling Error: Poisson Uncertainties

𝕍 𝑁 = 𝔼 𝑁 ∼𝐻 𝜋 𝑝

𝔼 Δ ln 𝑋

Based on Skilling (2006)

and Keeton (2011)

𝕍 ln 𝒵 ∼ 𝕍 ln 𝑋𝐻 ∼ 𝜎 𝑁 𝔼 Δ ln 𝑋 2

∼ 𝐻 𝜋 𝑝 𝔼 Δ ln 𝑋

Sampling Error: Poisson Uncertainties

𝕍 𝑁 = 𝔼 𝑁 ∼𝐻 𝜋 𝑝

𝔼 Δ ln 𝑋

Based on Skilling (2006)

and Keeton (2011)

𝕍 ln 𝒵 ∼ 𝕍 ln 𝑋𝐻 ∼ 𝜎 𝑁 𝔼 Δ ln 𝑋 2

𝑖=1

𝑁+𝐾

Δ𝐻𝑖𝔼 Δ ln 𝑋𝑖

= 1/𝐾

Sampling Error: Monte Carlo Noise

Formalism following Higson et al.

(2017) and Chopin and Robert (2010)

𝔼𝑝 𝑓 𝚯 = Ω𝚯

𝑓 𝚯 𝑝 𝚯 𝑑𝚯 =1

𝒵 0

1

𝑓 X ℒ 𝑋 𝑑𝑋

Sampling Error: Monte Carlo Noise

𝔼𝑝 𝑓 𝚯 = Ω𝚯

𝑓 𝚯 𝑝 𝚯 𝑑𝚯 =1

𝒵 0

1

𝑓 X ℒ 𝑋 𝑑𝑋

Formalism following Higson et al.

(2017) and Chopin and Robert (2010)

𝑓 X = 𝔼𝜋 𝑓 𝚯 ℒ 𝚯 = ℒ 𝑋

Sampling Error: Monte Carlo Noise

𝔼𝑝 𝑓 𝚯 = Ω𝚯

𝑓 𝚯 𝑝 𝚯 𝑑𝚯 =1

𝒵 0

1

𝑓 X ℒ 𝑋 𝑑𝑋

Formalism following Higson et al.

(2017) and Chopin and Robert (2010)

𝑓 X = 𝔼𝜋 𝑓 𝚯 ℒ 𝚯 = ℒ 𝑋

Sampling Error: Monte Carlo Noise

𝔼𝑝 𝑓 𝚯 = Ω𝚯

𝑓 𝚯 𝑝 𝚯 𝑑𝚯 =1

𝒵 0

1

𝑓 X ℒ 𝑋 𝑑𝑋

Formalism following Higson et al.

(2017) and Chopin and Robert (2010)

𝑓 X = 𝔼𝜋 𝑓 𝚯 ℒ 𝚯 = ℒ 𝑋

Sampling Error: Monte Carlo Noise

Formalism following Higson et al.

(2017) and Chopin and Robert (2010)

𝑖=1

𝑁+𝐾

𝑝𝑖 𝑓 𝚯𝑖

𝔼𝑝 𝑓 𝚯 = Ω𝚯

𝑓 𝚯 𝑝 𝚯 𝑑𝚯 =1

𝒵 0

1

𝑓 X ℒ 𝑋 𝑑𝑋

Exploring Sampling Uncertainties

One run with K “live points”

= K runs with 1 live point!

ℒ𝑁 > ℒ𝑁−1 > ⋯> ℒ2 > ℒ1 > 0

Exploring Sampling Uncertainties

One run with K “live points”

= K runs with 1 live point!

ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

1> 0

ℒ𝑁2

2> ⋯ > ℒ2

2> ℒ1

2> 0

Exploring Sampling Uncertainties

One run with K “live points”

= K runs with 1 live point!

ℒ 𝑖∙

= ℒ 𝑖1

, ℒ 𝑖2

, …

Original run

“strand”

“strand”

Exploring Sampling Uncertainties

ℒ 𝑖∙

= ℒ 𝑖1

, ℒ 𝑖2

, …

Original run

“strand”

“strand”

Exploring Sampling Uncertainties

ℒ 𝑖∙

= ℒ 𝑖1

, ℒ 𝑖2

, …

Original run

“strand”

“strand”

ℒ 𝑖∙

′= ℒ 𝑖

1′, ℒ 𝑖

2′, …

We would like to sample K paths from

the set of all possible paths P ℒ 𝑖 , … .

However, we don’t have access to it.

Exploring Sampling Uncertainties

ℒ 𝑖∙

= ℒ 𝑖1

, ℒ 𝑖2

, …

Original run

“strand”

“strand”

We would like to sample K paths from

the set of all possible paths P ℒ 𝑖 , … .

However, we don’t have access to it.Use bootstrap estimator.

ℒ 𝑖

∙′= ℒ 𝑖

1, ℒ 𝑖

1, ℒ 𝑖

2, …

Exploring Sampling Uncertainties

ℒ 𝑖∙

= ℒ 𝑖1

, ℒ 𝑖2

, …

Original run

“strand”

“strand”

𝒵′ ≈

𝑖=1

𝑁′

𝑤𝑖′ 𝑝𝑖

′ = 𝑤𝑖

𝒵′

We would like to sample K paths from

the set of all possible paths P ℒ 𝑖 , … .

However, we don’t have access to it.Use bootstrap estimator.

Nested SamplingIn Practice

Nested Sampling In Practice

Higson et al. (2017)

arxiv:1704.03459

Method 0: Sampling from the Prior

Higson et al. (2017)

arxiv:1704.03459

Method 0: Sampling from the Prior

Higson et al. (2017)

arxiv:1704.03459

Sampling from the prior

becomes exponentially

more inefficient as time

goes on.

Method 1: Constrained Uniform Sampling

Feroz et al. (2009)

Proposal:

Bound the iso-likelihood contours in real

time and sample from the newly

constrained prior.

Method 1: Constrained Uniform Sampling

Feroz et al. (2009)

Issues:

• How to ensure bounds always

encompass iso-likelihood contours?

• How to generate flexible bounds?

Method 1: Constrained Uniform Sampling

Feroz et al. (2009)

Issues:

• How to ensure bounds always

encompass iso-likelihood contours?

• How to generate flexible bounds?

Bootstrapping.

Easier with uniform

(transformed) prior.

Method 1: Constrained Uniform Sampling

Feroz et al. (2009)

Ellipsoids Balls/Cubes

Buchner (2014)

MultiNest

Method 2: “Evolving” Previous Samples

Proposal:

Generate independent samples subject to the likelihood

constraint by “evolving” copies of current live points.

Method 2: “Evolving” Previous Samples

Proposal:

Generate independent samples subject to the likelihood

constraint by “evolving” copies of current live points.

Method 2: “Evolving” Previous Samples

Proposal:

Generate independent samples subject to the likelihood

constraint by “evolving” copies of current live points.

• Random walks (i.e. MCMC)

• Slice sampling

• Random trajectories (i.e. HMC)

PolyChord

Method 2: “Evolving” Previous SamplesIssues:

• How to ensure samples are independent (thinning) and

properly distributed within likelihood constraint?

• How to generate efficient proposals?

• Random walks (i.e. MCMC)

• Slice sampling

• Random trajectories (i.e. HMC)

PolyChord

Example: Gaussian Shells

Feroz et al. (2013)

Example: Eggbox

Feroz et al. (2013)

Summary: (Static) Nested Sampling

Summary: (Static) Nested Sampling

1. Estimates the evidence 𝒵.

Summary: (Static) Nested Sampling

1. Estimates the evidence 𝒵.

2. Estimates the posterior 𝑝 𝚯 .

Summary: (Static) Nested Sampling

1. Estimates the evidence 𝒵.

2. Estimates the posterior 𝑝 𝚯 .

3. Possesses well-defined stopping criteria.

Summary: (Static) Nested Sampling

1. Estimates the evidence 𝒵.

2. Estimates the posterior 𝑝 𝚯 .

3. Possesses well-defined stopping criteria.

4. Combining runs improves inference.

Summary: (Static) Nested Sampling

1. Estimates the evidence 𝒵.

2. Estimates the posterior 𝑝 𝚯 .

3. Possesses well-defined stopping criteria.

4. Combining runs improves inference.

5. Sampling and statistical

uncertainties can be simulated

from a single run.

Dynamic NestedSampling

Dynamic Nested Sampling

Higson et al. (2017)

arxiv:1704.03459

Dynamic Nested Sampling

Higson et al. (2017)

arxiv:1704.03459

Dynamic Nested Sampling

Higson et al. (2017)

arxiv:1704.03459

Dynamic Nested Sampling

Higson et al. (2017)

arxiv:1704.03459

Dynamic Nested Sampling

Higson et al. (2017)

arxiv:1704.03459

Dynamic Nested Sampling

ℒmax1

> ℒ𝑁1

1> ⋯ > ℒ2

1> ℒ1

1> ℒmin

1

ℒ𝑁2

2> ⋯ > ℒ2

2> ℒ1

2> 0

Dynamic Nested Sampling

ℒ𝑁2

2> ⋯ > ℒ𝑁1

1… > ℒ1

1

> ℒ22

> ℒ12

> 0

𝑁2 live points

𝑁2 live points 𝑁1 + 𝑁2 live points

Dynamic Nested Sampling

ℒ𝑁2

2> ⋯ > ℒ𝑁1

1… > ℒ1

1

> ℒ22

> ℒ12

> 0

𝑁2 live points

𝑁2 live points 𝑁1 + 𝑁2 live pointsln 𝑋𝑖+1 =

𝑗=1

𝑖

ln 𝑇𝑗

Dynamic Nested Sampling

𝑁2 live points

𝑁2 live points 𝑁1 + 𝑁2 live pointsln 𝑋𝑖+1 =

𝑗=1

𝑖

ln 𝑇𝑗

𝐾𝑖+1 ≥ 𝐾𝑖:𝑇𝑖+1 ∼ Beta(𝐾𝑖+1, 1)

Constant/Increasingℒ𝑁2

2> ⋯ > ℒ𝑁1

1… > ℒ1

1

> ℒ22

> ℒ12

> 0

Dynamic Nested Sampling

ℒ𝑁2

2> ⋯ > ℒ𝑁1

1… > ℒ1

1

> ℒ22

> ℒ12

> 0

𝑁2 live points

𝑁2 live points 𝑁1 + 𝑁2 live pointsln 𝑋𝑖+1 =

𝑗=1

𝑖

ln 𝑇𝑗

𝐾𝑖+1 ≥ 𝐾𝑖:𝑇𝑖+1 ∼ Beta(𝐾𝑖+1, 1)

𝐾𝑖+𝑗>𝑖 < 𝐾𝑖:

𝑇𝑖+1, … 𝑇𝑖+𝑗 ∼

𝑈 𝐾𝑖, … , 𝑈 𝐾𝑖−𝑗+1

Constant/Increasing

Decreasing sequence

Static Nested Sampling

Exponential

shrinkage

Uniform

shrinkage

ln 𝔼 𝑋𝑁+𝑘 =

𝑖=1

𝑁

ln𝐾

𝐾 + 1+

𝑗=1

𝑘

ln𝐾 − 𝑗 + 1

𝐾 − 𝑗 + 2

Dynamic Nested Sampling

ln 𝔼 𝑋𝑁 =

𝑖=1

𝑛1

ln𝐾𝑖

𝐾𝑖 + 1+

𝑗=1

𝑛2

ln𝐾𝑛1

− 𝑗 + 1

𝐾𝑛1− 𝑗 + 2

+ ⋯ +

𝑗=1

𝐾final

ln𝐾final − 𝑗 + 1

𝐾final − 𝑗 + 2

Exponential

shrinkage

Uniform

shrinkage

Benefits of Dynamic Nested Sampling

• Can accommodate new “strands” within a

particular range of prior volumes without

changing overall statistical framework.

• Particles can be adaptively added until stopping

criteria are reached, allowing targeted

estimation.

Benefits of Dynamic Nested Sampling

• Can accommodate new “strands” within a

particular range of prior volumes without

changing overall statistical framework.

• Particles can be adaptively added until stopping

criteria are reached, allowing targeted

estimation.

Allocating Samples

Higson et al. (2017)

arxiv:1704.03459

𝐼𝒵 𝑖 ∼𝔼 𝒵>𝑖

𝐾𝑖

𝐼𝑃 𝑖 ∼ 𝑝𝑖

𝐼 𝐺, 𝑖 = 𝐺𝐼𝑃 𝑖

𝑗 𝐼𝑃 𝑗+ 1 − 𝐺

𝐼𝒵 𝑖

𝑗 𝐼𝒵 𝑗

Posterior weight Evidence weight

Weight Function

Allocating Samples

Higson et al. (2017)

arxiv:1704.03459

Allocating Samples

3-D correlated multivariate Normal

Static Dynamic

(posterior)

Dynamic

(evidence)

How Many Samples is Enough?

How Many Samples is Enough?

• Ill-posed question: depends on application!

How Many Samples is Enough?

• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?

How Many Samples is Enough?

• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?

Assume general case: we want D-dimensional

𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.

“True” posterior

constructed over

same “domain”.

How Many Samples is Enough?

• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?

Assume general case: we want D-dimensional

𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.

𝐻 𝑝 𝑝 ≡ Ω𝚯

𝑝 𝚯 ln 𝑝 𝚯

𝑝 𝚯𝑑𝚯

How Many Samples is Enough?

• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?

Assume general case: we want D-dimensional

𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.

𝐻 𝑝 𝑝 =

𝑖=1

𝑁

𝑝𝑖 ln 𝑝𝑖 𝑝𝑖

How Many Samples is Enough?

• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?

Assume general case: we want D-dimensional

𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.

𝐻 𝑝 𝑝 =

𝑖=1

𝑁

𝑝𝑖 ln 𝑝𝑖 𝑝𝑖

We want access to P 𝑝𝑖 𝑝 , but we don’t know 𝑝 𝚯 .

How Many Samples is Enough?

• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?

Assume general case: we want D-dimensional

𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.

𝐻 𝑝 𝑝′ =

𝑖=1

𝑁

𝑝′𝑖ln 𝑝′

𝑖 𝑝𝑖

We want access to Pr 𝑝𝑖 𝑝 𝚯 , but we don’t know 𝑝 𝚯 .

Use bootstrap estimator.

How Many Samples is Enough?

• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?

Assume general case: we want D-dimensional

𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.

𝐻 𝑝 𝑝′ =

𝑖=1

𝑁

𝑝′𝑖ln 𝑝′

𝑖 𝑝𝑖

We want access to Pr 𝑝𝑖 𝑝 𝚯 , but we don’t know 𝑝 𝚯 .

Use bootstrap estimator.

Random

variable

How Many Samples is Enough?

• In any sampling-based approach to estimating 𝑝 𝚯with 𝑝 𝚯 , how many samples are necessary?

Assume general case: we want D-dimensional

𝑝 𝚯𝑖 and 𝑝 𝚯i densities to be “close”.

Possible stopping criterion: fractional (%) variation in H.

We want access to Pr 𝑝𝑖 𝑝 𝚯 , but we don’t know 𝑝 𝚯 .

Use bootstrap estimator.

Dynamic Nested Sampling Summary1. Can sample from multi-modal distributions.

2. Can simultaneously estimate the evidence 𝒵 and

posterior 𝑝 𝚯 .

3. Combining independent runs improves inference

(“trivially parallelizable”).

4. Can simulate uncertainties (sampling and statistical) from

a single run.

5. Enables adaptive sample allocation during runtime using

arbitrary weight functions.

6. Possesses evidence/posterior-based stopping criteria.

Examples andApplications

Dynamic Nested Sampling with dynesty

dynesty.readthedocs.io

• Pure Python.

• Easy to use.

• Modular.

• Open source.

• Parallelizable.

• Flexible bounding/sampling methods.

• Thorough documentation!

Example:

Example: Linear Regression (Posterior)

𝚯 = {𝑎, 𝑏, ln 𝑓}

Example: Linear Regression (Posterior)

Corner Plot

Example: Linear Regression (Posterior)

Corner Plot

Trace Plot

Example: Multivariate Normal (Evidence)

𝚯 = 𝑥1, 𝑥2, 𝑥3

Example: Multivariate Normal (Evidence)

“Summary” Plot

Example: Multivariate Normal (Errors)

StaticDynamic

“Summary” Plot

Application:

Application:

• All results are preliminary but agree with results

from MCMC methods (derived using emcee).

• Samples allocated with 100% posterior weight,

automated stopping criterion (2% fractional error

in simulated KLD).

• dynesty was substantially (~3-6x) more efficient

at generating good samples than emcee, before

thinning.

Application: Modeling Galaxy SEDs

𝚯 = ln 𝑀∗ , ln 𝑍 , 𝝈5, 𝜹6, 𝜶2 D=15

With: Joel Leja, Ben Johnson, Charlie Conroy

Application: Modeling Galaxy SEDs

With: Joel Leja, Ben Johnson, Charlie Conroy

Application: Modeling Galaxy SEDs

Fig: Joel Leja

With: Joel Leja, Ben Johnson, Charlie Conroy

Application: Supernovae Light Curves

Fig: Open Supernova Catalog (LSQ12dlf), James Guillochon

With: James Guillochon, Kaisey Mandel

𝚯 = 𝑡𝑒 , 𝝆4, 𝝐3, 𝜹3, 𝜎2 D=12

Application: Supernovae Light Curves

With: James Guillochon, Kaisey Mandel

Application: Molecular Cloud Distances

With: Catherine Zucker, Doug Finkbeiner, Alyssa Goodman

𝚯 = 𝒇2, 𝒅5, 𝒄5, 𝑝𝑜 D=13

Application: Molecular Cloud Distances

With: Catherine Zucker, Doug Finkbeiner, Alyssa Goodman

Dynamic Nested Sampling with dynesty

dynesty.readthedocs.io

• Pure Python.

• Easy to use.

• Modular.

• Open source.

• Parallelizable.

• Flexible bounding/sampling methods.

• Thorough documentation!

top related