An Introduction to (Dynamic) Nested Sampling

An Introduction to (Dynamic) Nested Sampling Josh Speagle [email protected]

Introduction

Background Pr , M = Pr , M Pr |M
Pr M Bayes’ Theorem

Pr M Bayes’ Theorem Prior

Pr M Bayes’ Theorem Prior Likelihood

Pr M Bayes’ Theorem Prior Likelihood Posterior

Pr M Bayes’ Theorem Prior Likelihood Posterior Evidence

Background = ℒ Bayes’ Theorem ≡ Ω ℒ Posterior Likelihood
Prior Evidence

Posterior Estimation via Sampling , −1 , … , 2
, 1 ∈ Ω Samples

, 1 ∈ Ω , −1 , … , 2 , 1 ⇒ = =1 − i Samples Weights

, 1 ∈ Ω , −1 , … , 2 , 1 ⇒ = =1 − i Samples Weights 1, 1, … , 1, 1 MCMC

, 1 ∈ Ω , −1 , … , 2 , 1 ⇒ = =1 − i Samples Weights 1, 1, … , 1, 1 MCMC , −1 −1 , … , 2 2 , 1 1 Importance Sampling

, 1 ∈ Ω , −1 , … , 2 , 1 ⇒ = =1 − i Samples Weights Nested Sampling?

Motivation: Integrating the Posterior Ω

Motivation: Integrating the Posterior {: =} =

Motivation: Integrating the Posterior 0 ∞ =

Motivation: Integrating the Posterior 0 ∞ “Amplitude” “Volume” =

Motivation: Integrating the Posterior 0 ∞ = “Typical Set”

Motivation: Integrating the Posterior 0 ∞ ∝ −1 “Typical Set”

Motivation: Integrating the Posterior Ω

Motivation: Integrating the Posterior ≡ Ω ℒ

Motivation: Integrating the Posterior ≡ Ω ℒ ≡ :ℒ >
“Prior Volume” Feroz et al. (2013)

Motivation: Integrating the Posterior = 0 ∞ ≡ :ℒ >
“Prior Volume” Feroz et al. (2013)

Motivation: Integrating the Posterior = 0 1 ℒ ≡ :ℒ
> “Prior Volume” Feroz et al. (2013)

Motivation: Integrating the Posterior ≈ =1 ℒ , … ,
… ≡ :ℒ > “Prior Volume” Feroz et al. (2013) ℒ , … , … can be rectangles, trapezoids, etc. Amplitude Differential Volume Δ

Motivation: Integrating the Posterior ≈ =1 ≡ :ℒ > “Prior
Volume” Feroz et al. (2013) = ℒ , … , …

Motivation: Integrating the Posterior = We get posteriors “for free”
≈ =1 Importance Weight = ℒ , … , …

Motivation: Integrating the Posterior ≈ =1 = ℒ , …
, … Directly proportional to typical set. = We get posteriors “for free” Importance Weight ~ , … , …

Motivation: Sampling the Posterior Sampling directly from the likelihood ℒ
is hard. Pictures from this 2010 talk by Skilling.

Motivation: Sampling the Posterior Sampling uniformly within bound ℒ >
is easier. Pictures from this 2010 talk by Skilling.

is easier. Pictures from this 2010 talk by Skilling. −1

is easier. Pictures from this 2010 talk by Skilling. +1

is easier. Pictures from this 2010 talk by Skilling. +1 MCMC: Solving a Hard Problem once. vs Nested Sampling: Solving an Easier Problem many times.

How Nested Sampling Works

Estimating the Prior Volume ≈ =1 ≡ :ℒ > “Prior
Volume” Feroz et al. (2013) = ℒ , … , …

Volume” Feroz et al. (2013) = ℒ , … , … 

Volume” Feroz et al. (2013) = ℒ , … , …  ???

Volume” Feroz et al. (2013) = ℒ , … , …  ??? ~ ~ Unif PDF CDF Probability Integral Transform

Volume” Feroz et al. (2013) = ℒ , … , …  ??? ~ ~ Unif PDF CDF Probability Integral Transform Need to sample from the constrained prior.

Estimating the Prior Volume Pictures from this 2010 talk by
Skilling. +1 Posterior

Estimating the Prior Volume Pictures from this 2010 talk by
Skilling. +1 +1 = ~ Unif Posterior

Pictures from this 2010 talk by Skilling. +1 +1 =
=0 0 0 , … , ~ Unif i. i. d. Estimating the Prior Volume Posterior

=0 0 0 , … , ~ Unif 0 ≡ 1 i. i. d. Estimating the Prior Volume Posterior

=0 0 , … , ~ Unif i. i. d. 0 ≡ 1 Estimating the Prior Volume Posterior

Volume” Feroz et al. (2013) = ℒ , … , …  ???

Volume” Feroz et al. (2013) = ℒ , … , …  Pr 1 , 2 , … , where 1st & 2nd moments can be computed

Nested Sampling Algorithm ℒ > ℒ−1 > ⋯ > ℒ2
> ℒ1 > 0 ℒ > ℒ−1 > ⋯ > ℒ2 > ℒ1 > 0 1 2 −1

Nested Sampling Algorithm (Ideal) ℒ > ℒ−1 > ⋯ >
ℒ2 > ℒ1 > 0 Samples sequentially drawn from constrained prior ℒ>ℒ . ℒ > ℒ−1 > ⋯ > ℒ2 > ℒ1 > 0 1 2 −1

Nested Sampling Algorithm (Naïve) ℒ > ℒ−1 > ⋯ >
ℒ2 > ℒ1 > 0 1 ~ 1. Samples sequentially drawn from prior . 2. New point only accepted if ℒ+1 > ℒ.

Nested Sampling Algorithm (Naïve) ℒ > ℒ−1 > ⋯ >
ℒ2 > ℒ1 > 0 1 ~ 2 1. Samples sequentially drawn from prior . 2. New point only accepted if ℒ+1 > ℒ.

ℒ > ℒ−1 > ⋯ > ℒ2 > ℒ1 >
0 Nested Sampling Algorithm (Naïve) ℒ > ℒ−1 > ⋯ > ℒ2 > ℒ1 > 0 1 ~ 2 −1 1. Samples sequentially drawn from prior . 2. New point only accepted if ℒ+1 > ℒ.

Building a Nested Sampling Algorithm

Components of the Algorithm 1. Adding more particles. 2. Knowing
when to stop. 3. What to do after stopping.

Adding More Particles ℒ1 1 > ⋯ > ℒ2 1
> ℒ1 1 > 0 Skilling (2006)

> ℒ1 1 > 0 Skilling (2006) +1 1 = =0

> ℒ1 1 > 0 ℒ2 2 > ⋯ > ℒ2 2 > ℒ1 2 > 0 Skilling (2006)

> ℒ1 1 > 0 ℒ2 2 > ⋯ > ℒ2 2 > ℒ1 2 > 0 Skilling (2006) +1 1 = =0 +1 2 = =0

Adding More Particles ℒ1 1 > ℒ2 2 > ⋯
> ℒ2 1 > ℒ2 2 > ℒ1 2 > ℒ1 1 > 0 Skilling (2006)

Adding More Particles ln +1 = =0 ln 0 ,
… , ~ Unif Skilling (2006)

… , ~ 2 1 , 2 ~ Unif ⇒ 1 , 2 Order Statistics Skilling (2006)

… , ~ Beta 2,1 Skilling (2006) 1 , 2 ~ Unif ⇒ 1 , 2 Order Statistics

> ℒ2 1 > ℒ2 2 > ℒ1 2 > ℒ1 1 > 0 Skilling (2006)

> ℒ2 1 > ℒ2 2 > ℒ1 2 > ℒ1 1 > 0 One run with 2 “live points” = 2 runs with 1 live point Skilling (2006) live points dead points

> ℒ2 1 > ℒ2 2 > ℒ1 2 > ℒ1 1 > 0 One run with K “live points” = K runs with 1 live point Skilling (2006) live points dead points

… , ~ Beta 2,1 Skilling (2006) 1 , 2 ~ Unif ⇒ 1 , 2 Order Statistics live points dead points

Adding More Particles 0 , … , ~ Beta K,
1 Skilling (2006) 1 , … , ~ Unif ⇒ 1 , … , Order Statistics ln +1 = =0 ln

Adding More Particles 0 , … , ~ Beta K,
1 Skilling (2006) ln +1 = =0 ln + 1 1 , … , ~ Unif ⇒ 1 , … , Order Statistics

Stopping Criteria = + in

Stopping Criteria + in >

Stopping Criteria = =1 + in

Stopping Criteria ≤ =1 + ℒ K Uniform slab Maximum
Likelihood

Stopping Criteria ≲ =1 + ℒ K Uniform slab Maximum
Likelihood

“Recycling” the Final Set of Particles

“Recycling” the Final Set of Particles … , +1 ,
… ~ 1 , … , ∼ Unif

… ~ 1 , … , ∼ Unif ℒ > ℒ−1 > ⋯ > ℒ2 > ℒ1 > 0

… ~ 1 , … , ∼ Unif ℒ+ > ⋯ > ℒ+1 > ℒ > ℒ−1 > ⋯ > ℒ2 > ℒ1 > 0

… ~ 1 , … , ∼ Unif ℒ+ > ⋯ > ℒ+1 > ℒ > ℒ−1 > ⋯ > ℒ2 > ℒ1 > 0 + = +1 −+1

“Recycling” the Final Set of Particles … , + ,
… ~ −+1 1 , … , ∼ Unif ℒ+ > ⋯ > ℒ+1 > ℒ > ℒ−1 > ⋯ > ℒ2 > ℒ1 > 0 + = +1 −+1

“Recycling” the Final Set of Particles … , + ,
… ~ −+1 1 , … , ∼ Unif ℒ+ > ⋯ > ℒ+1 > ℒ > ℒ−1 > ⋯ > ℒ2 > ℒ1 > 0 + = +1 −+1 Rényi Representation

“Recycling” the Final Set of Particles … , , …
∼ =1 =1 +1 , 1 , … , +1 ∼ Expo Rényi Representation … , + , … ~ −+1 1 , … , ∼ Unif

∼ =1 =1 +1 , 1 , … , +1 ∼ Expo Rényi Representation … , + , … ~ −+1 1 , … , ∼ Unif 1 2 3 −1 …

∼ =1 =1 +1 , 1 , … , +1 ∼ Expo Rényi Representation … , + , … ~ −+1 1 , … , ∼ Unif 1 2 3 +1 …

“Recycling” the Final Set of Particles ln = =1 ln
+ 1

“Recycling” the Final Set of Particles ln + = =1
ln + 1 + ln − + 1 + 1

ln + 1 + =1 ln − + 1 − + 2

ln + 1 + =1 ln − + 1 − + 2 Exponential Shrinkage Uniform Shrinkage

Nested Sampling Errors

Nested Sampling Uncertainties Pictures from this 2010 talk by Skilling.

Statistical uncertainties • Unknown prior volumes

Statistical uncertainties Sampling uncertainties • Unknown prior volumes • Number of samples (counting) • Discrete point estimates for contours • Particle path dependencies

Sampling Error: Poisson Uncertainties = ∼ ? ? ? Δ
ln Based on Skilling (2006) and Keeton (2011) “Distance” from prior to posterior

Sampling Error: Poisson Uncertainties ≡ Ω ln Based on Skilling
(2006) and Keeton (2011) Kullback-Leibler divergence from to  “information gained”.

Sampling Error: Poisson Uncertainties ≡ Ω ln = 1 0
1 ℒ ln ℒ − ln Based on Skilling (2006) and Keeton (2011) Kullback-Leibler divergence from to  “information gained”.

Sampling Error: Poisson Uncertainties = ∼ Δ ln Based on
Skilling (2006) and Keeton (2011)

Skilling (2006) and Keeton (2011) ln

Skilling (2006) and Keeton (2011) ln ∼ ln

Skilling (2006) and Keeton (2011) ln ∼ ln ∼ Δ ln 2

Skilling (2006) and Keeton (2011) ln ∼ ln ∼ Δ ln 2 ∼ Δ ln

Skilling (2006) and Keeton (2011) ln ∼ ln ∼ Δ ln 2 ∼ =1 + Δ Δ ln = 1/

Sampling Error: Monte Carlo Noise Formalism following Higson et al.
(2017) and Chopin and Robert (2010) = Ω = 1 0 1 X ℒ

Sampling Error: Monte Carlo Noise = Ω = 1 0
1 X ℒ Formalism following Higson et al. (2017) and Chopin and Robert (2010) X = ℒ = ℒ

Sampling Error: Monte Carlo Noise Formalism following Higson et al.
(2017) and Chopin and Robert (2010) ≈ =1 + = Ω = 1 0 1 X ℒ

Exploring Sampling Uncertainties One run with K “live points” =
K runs with 1 live point! ℒ > ℒ−1 > ⋯ > ℒ2 > ℒ1 > 0

K runs with 1 live point! ℒ1 1 > ⋯ > ℒ2 1 > ℒ1 1 > 0 ℒ2 2 > ⋯ > ℒ2 2 > ℒ1 2 > 0

K runs with 1 live point! ℒ ∙ = ℒ 1 , ℒ 2 , … Original run “strand” “strand”

Exploring Sampling Uncertainties ℒ ∙ = ℒ 1 , ℒ
2 , … Original run “strand” “strand”

2 , … Original run “strand” “strand” ℒ ∙ ′ = ℒ 1 ′ , ℒ 2 ′ , … We would like to sample K paths from the set of all possible paths P ℒ , … . However, we don’t have access to it.

2 , … Original run “strand” “strand” We would like to sample K paths from the set of all possible paths P ℒ , … . However, we don’t have access to it. Use bootstrap estimator. ℒ ∙ ′ = ℒ 1 , ℒ 1 , ℒ 2 , …

2 , … Original run “strand” “strand” ′ ≈ =1 ′ ′ ′ = ′ ′ We would like to sample K paths from the set of all possible paths P ℒ , … . However, we don’t have access to it. Use bootstrap estimator.

Nested Sampling In Practice

Nested Sampling In Practice Higson et al. (2017) arxiv:1704.03459

Method 0: Sampling from the Prior Higson et al. (2017)
arxiv:1704.03459

Method 0: Sampling from the Prior Higson et al. (2017)
arxiv:1704.03459 Sampling from the prior becomes exponentially more inefficient as time goes on.

Method 1: Constrained Uniform Sampling Feroz et al. (2009) Proposal:
Bound the iso-likelihood contours in real time and sample from the newly constrained prior.

Method 1: Constrained Uniform Sampling Feroz et al. (2009) Issues:
• How to ensure bounds always encompass iso-likelihood contours? • How to generate flexible bounds?

Method 1: Constrained Uniform Sampling Feroz et al. (2009) Issues:
• How to ensure bounds always encompass iso-likelihood contours? • How to generate flexible bounds? Bootstrapping. Easier with uniform (transformed) prior.

Method 1: Constrained Uniform Sampling Feroz et al. (2009) Ellipsoids
Balls/Cubes Buchner (2014) MultiNest

Method 2: “Evolving” Previous Samples Proposal: Generate independent samples subject
to the likelihood constraint by “evolving” copies of current live points.

Method 2: “Evolving” Previous Samples Proposal: Generate independent samples subject
to the likelihood constraint by “evolving” copies of current live points. • Random walks (i.e. MCMC) • Slice sampling • Random trajectories (i.e. HMC) PolyChord

Method 2: “Evolving” Previous Samples Issues: • How to ensure
samples are independent (thinning) and properly distributed within likelihood constraint? • How to generate efficient proposals? • Random walks (i.e. MCMC) • Slice sampling • Random trajectories (i.e. HMC) PolyChord

Example: Gaussian Shells Feroz et al. (2013)

Example: Eggbox Feroz et al. (2013)

Summary: (Static) Nested Sampling

Summary: (Static) Nested Sampling 1. Estimates the evidence .

Summary: (Static) Nested Sampling 1. Estimates the evidence . 2.
Estimates the posterior .

Estimates the posterior . 3. Possesses well-defined stopping criteria.

Estimates the posterior . 3. Possesses well-defined stopping criteria. 4. Combining runs improves inference.

Estimates the posterior . 3. Possesses well-defined stopping criteria. 4. Combining runs improves inference. 5. Sampling and statistical uncertainties can be simulated from a single run.

Dynamic Nested Sampling

Dynamic Nested Sampling Higson et al. (2017) arxiv:1704.03459

Dynamic Nested Sampling ℒmax 1 > ℒ 1 1 >
⋯ > ℒ 2 1 > ℒ1 1 > ℒ min 1 ℒ2 2 > ⋯ > ℒ2 2 > ℒ1 2 > 0

Dynamic Nested Sampling ℒ2 2 > ⋯ > ℒ1 1
… > ℒ1 1 > ℒ2 2 > ℒ1 2 > 0 2 live points 2 live points 1 + 2 live points

… > ℒ1 1 > ℒ2 2 > ℒ1 2 > 0 2 live points 2 live points 1 + 2 live points ln +1 = =1 ln

Dynamic Nested Sampling 2 live points 2 live points 1
+ 2 live points ln +1 = =1 ln +1 ≥ : +1 ∼ Beta(+1 , 1) Constant/Increasing ℒ2 2 > ⋯ > ℒ1 1 … > ℒ1 1 > ℒ2 2 > ℒ1 2 > 0

… > ℒ1 1 > ℒ2 2 > ℒ1 2 > 0 2 live points 2 live points 1 + 2 live points ln +1 = =1 ln +1 ≥ : +1 ∼ Beta(+1 , 1) +> < : +1 , … + ∼ , … , −+1 Constant/Increasing Decreasing sequence

Static Nested Sampling Exponential shrinkage Uniform shrinkage ln + =
=1 ln + 1 + =1 ln − + 1 − + 2

Dynamic Nested Sampling ln = =1 1 ln + 1
+ =1 2 ln 1 − + 1 1 − + 2 + ⋯ + =1 final ln final − + 1 final − + 2 Exponential shrinkage Uniform shrinkage

Benefits of Dynamic Nested Sampling • Can accommodate new “strands”
within a particular range of prior volumes without changing overall statistical framework. • Particles can be adaptively added until stopping criteria are reached, allowing targeted estimation.

Sampling Uncertainties (Static) ℒ ∙ = ℒ 1 , ℒ
2 , … Original run “strand” “strand” “strand” We would like to sample K paths from the set of all possible paths P ℒ , … . However, we don’t have access to it. Use bootstrap estimator. ℒ ∙ ′ = ℒ 1 , ℒ 1 , ℒ 2 , …

Sampling Uncertainties (Dynamic) ℒ ∙ = ℒ 1 , ℒ
2 , … Original run ℒ min 1 = −∞ (originated from the prior) “strand” ℒ min 2 = 2 (originated interior to the prior)

2 , … Original run ℒ min 1 = −∞ (originated from the prior) “strand” ℒ min 2 = 2 (originated interior to the prior) We would like to sample: paths from P ℒ , … and paths from P ℒ , … .

2 , … Original run ℒ min 1 = −∞ (originated from the prior) “strand” ℒ min 2 = 2 (originated interior to the prior) We would like to sample: paths from P ℒ , … and paths from P ℒ , … . Use stratified bootstrap estimator. ℒ ∙ ′ = ℒ 1 , ℒ 1 , ℒ 2 , …

Dynamic Nested Sampling • Can accommodate new “strands” within a
particular range of prior volumes without changing overall statistical framework. • Particles can be adaptively added until stopping criteria are reached, allowing targeted estimation.

Allocating Samples Higson et al. (2017) arxiv:1704.03459 ∼ > ∼
, = + 1 − Posterior weight Evidence weight Weight Function

Allocating Samples Higson et al. (2017) arxiv:1704.03459

Allocating Samples 3-D correlated multivariate Normal Static Dynamic (posterior) Dynamic
(evidence)

How Many Samples is Enough?

How Many Samples is Enough? • Ill-posed question: depends on
application!

How Many Samples is Enough? • In any sampling-based approach
to estimating with , how many samples are necessary?

to estimating with , how many samples are necessary? Assume general case: we want D-dimensional and i densities to be “close”. “True” posterior constructed over same “domain”.

to estimating with , how many samples are necessary? Assume general case: we want D-dimensional and i densities to be “close”. ≡ Ω ln

to estimating with , how many samples are necessary? Assume general case: we want D-dimensional and i densities to be “close”. = =1 ln

to estimating with , how many samples are necessary? Assume general case: we want D-dimensional and i densities to be “close”. = =1 ln We want access to P , but we don’t know .

to estimating with , how many samples are necessary? Assume general case: we want D-dimensional and i densities to be “close”. ′ = =1 ′ ln ′ We want access to Pr , but we don’t know . Use bootstrap estimator.

to estimating with , how many samples are necessary? Assume general case: we want D-dimensional and i densities to be “close”. ′ = =1 ′ ln ′ We want access to Pr , but we don’t know . Use bootstrap estimator. Random variable

to estimating with , how many samples are necessary? Assume general case: we want D-dimensional and i densities to be “close”. Possible stopping criterion: fractional (%) variation in H. We want access to Pr , but we don’t know . Use bootstrap estimator.

Dynamic Nested Sampling Summary 1. Can sample from multi-modal distributions.
2. Can simultaneously estimate the evidence and posterior . 3. Combining independent runs improves inference (“trivially parallelizable”). 4. Can simulate uncertainties (sampling and statistical) from a single run. 5. Enables adaptive sample allocation during runtime using arbitrary weight functions. 6. Possesses evidence/posterior-based stopping criteria.

Examples and Applications

Dynamic Nested Sampling with dynesty dynesty.readthedocs.io • Pure Python. •
Easy to use. • Modular. • Open source. • Parallelizable. • Flexible bounding/sampling methods. • Thorough documentation!

Example:

Example: Linear Regression (Posterior) = {, , ln }

Example: Linear Regression (Posterior) Corner Plot

Example: Linear Regression (Posterior) Corner Plot Trace Plot

Example: Multivariate Normal (Evidence) = 1 , 2 , 3

Example: Multivariate Normal (Evidence) “Summary” Plot

Example: Multivariate Normal (Errors) Static Dynamic “Summary” Plot

Example: 50-D Multivariate Normal “Summary” Plot = 1 , …
, 50

Example: Eggbox = 1 , 2

Example: Eggbox

Example: 2-D and 10-D LogGamma = 1 , … ,

Example: 2-D and 10-D LogGamma 10-D 2-D = 1 ,
… ,

Application:

Application: • All results are preliminary but agree with results
from MCMC methods (derived using emcee). • Samples allocated with 100% posterior weight, automated stopping criterion (2% fractional error in simulated KLD). • dynesty was substantially (~3-6x) more efficient at generating good samples than emcee, before thinning.

Application: Modeling Galaxy SEDs = ln ∗ , ln ,
5 , 6 , 2 D=15 With: Joel Leja, Ben Johnson, Charlie Conroy

Application: Modeling Galaxy SEDs With: Joel Leja, Ben Johnson, Charlie
Conroy

Application: Modeling Galaxy SEDs Fig: Joel Leja With: Joel Leja,
Ben Johnson, Charlie Conroy

Application: Supernovae Light Curves Fig: Open Supernova Catalog (LSQ12dlf), James
Guillochon With: James Guillochon, Kaisey Mandel = , 4 , 3 , 3 , 2 D=12

Application: Supernovae Light Curves With: James Guillochon, Kaisey Mandel

Application: Molecular Cloud Distances With: Catherine Zucker, Doug Finkbeiner, Alyssa
Goodman = 2 , 5 , 5 , D=13

Application: Molecular Cloud Distances With: Catherine Zucker, Doug Finkbeiner, Alyssa
Goodman

Dynamic Nested Sampling with dynesty dynesty.readthedocs.io • Pure Python. •
Easy to use. • Modular. • Open source. • Parallelizable. • Flexible bounding/sampling methods. • Thorough documentation!

An Introduction to (Dynamic) Nested Sampling

An Introduction to (Dynamic) Nested Sampling

More Decks by Josh Speagle

Other Decks in Research

Featured

Transcript