Motivation: Integrating the Posterior
=
We get posteriors “for free”
≈
=1
Importance Weight
= ℒ
, …
, …
Slide 29
Slide 29 text
Motivation: Integrating the Posterior
≈
=1
= ℒ
, …
, …
Directly proportional to
typical set.
=
We get posteriors “for free”
Importance Weight
~
, …
, …
Slide 30
Slide 30 text
Motivation: Sampling the Posterior
Sampling directly from the
likelihood ℒ is hard.
Pictures from this 2010 talk by Skilling.
Slide 31
Slide 31 text
Motivation: Sampling the Posterior
Sampling uniformly within
bound ℒ > is easier.
Pictures from this 2010 talk by Skilling.
Slide 32
Slide 32 text
Motivation: Sampling the Posterior
Sampling uniformly within
bound ℒ > is easier.
Pictures from this 2010 talk by Skilling.
−1
Slide 33
Slide 33 text
Motivation: Sampling the Posterior
Sampling uniformly within
bound ℒ > is easier.
Pictures from this 2010 talk by Skilling.
−1
Slide 34
Slide 34 text
Motivation: Sampling the Posterior
Sampling uniformly within
bound ℒ > is easier.
Pictures from this 2010 talk by Skilling.
−1
Slide 35
Slide 35 text
Motivation: Sampling the Posterior
Sampling uniformly within
bound ℒ > is easier.
Pictures from this 2010 talk by Skilling.
+1
Slide 36
Slide 36 text
Motivation: Sampling the Posterior
Sampling uniformly within
bound ℒ > is easier.
Pictures from this 2010 talk by Skilling.
+1
MCMC: Solving a Hard Problem once.
vs
Nested Sampling: Solving an Easier
Problem many times.
Estimating the Prior Volume
≈
=1
≡
:ℒ >
“Prior Volume”
Feroz et al. (2013)
= ℒ
, …
, …
???
~
~ Unif
PDF
CDF
Probability Integral
Transform
Slide 42
Slide 42 text
Estimating the Prior Volume
≈
=1
≡
:ℒ >
“Prior Volume”
Feroz et al. (2013)
= ℒ
, …
, …
???
~
~ Unif
PDF
CDF
Probability Integral
Transform Need to sample from
the constrained prior.
Slide 43
Slide 43 text
Estimating the Prior Volume
Pictures from this 2010 talk by Skilling.
+1
Posterior
Slide 44
Slide 44 text
Estimating the Prior Volume
Pictures from this 2010 talk by Skilling.
+1
+1
=
~ Unif
Posterior
Slide 45
Slide 45 text
Pictures from this 2010 talk by Skilling.
+1
+1
=
=0
0
0
, … ,
~ Unif
i. i. d.
Estimating the Prior Volume
Posterior
Slide 46
Slide 46 text
Pictures from this 2010 talk by Skilling.
+1
+1
=
=0
0
0
, … ,
~ Unif
0
≡ 1
i. i. d.
Estimating the Prior Volume
Posterior
Slide 47
Slide 47 text
Pictures from this 2010 talk by Skilling.
+1
+1
=
=0
0
, … ,
~ Unif
i. i. d.
0
≡ 1
Estimating the Prior Volume
Posterior
Adding More Particles
ℒ1
1 > ℒ2
2 > ⋯ > ℒ2
1
> ℒ2
2 > ℒ1
2 > ℒ1
1 > 0
One run with 2 “live points”
= 2 runs with 1 live point
Skilling (2006)
live points
dead points
Slide 71
Slide 71 text
Adding More Particles
ℒ1
1 > ℒ2
2 > ⋯ > ℒ2
1
> ℒ2
2 > ℒ1
2 > ℒ1
1 > 0
One run with K “live points”
= K runs with 1 live point
Skilling (2006)
live points
dead points
Slide 72
Slide 72 text
Adding More Particles
ln +1
=
=0
ln
0
, … ,
~ Beta 2,1
Skilling (2006)
1
, 2
~ Unif ⇒ 1
, 2
Order Statistics live points
dead points
“Recycling” the Final Set of Particles
ln
=
=1
ln
+ 1
Slide 92
Slide 92 text
“Recycling” the Final Set of Particles
ln +
=
=1
ln
+ 1
+ ln
− + 1
+ 1
Slide 93
Slide 93 text
“Recycling” the Final Set of Particles
ln +
=
=1
ln
+ 1
+
=1
ln
− + 1
− + 2
Slide 94
Slide 94 text
“Recycling” the Final Set of Particles
ln +
=
=1
ln
+ 1
+
=1
ln
− + 1
− + 2
Exponential Shrinkage Uniform Shrinkage
Slide 95
Slide 95 text
Nested Sampling
Errors
Slide 96
Slide 96 text
Nested Sampling Uncertainties
Pictures from this 2010 talk by Skilling.
Slide 97
Slide 97 text
Nested Sampling Uncertainties
Pictures from this 2010 talk by Skilling.
Slide 98
Slide 98 text
Nested Sampling Uncertainties
Pictures from this 2010 talk by Skilling.
Statistical
uncertainties
• Unknown prior
volumes
Slide 99
Slide 99 text
Nested Sampling Uncertainties
Pictures from this 2010 talk by Skilling.
Statistical
uncertainties
Sampling
uncertainties
• Unknown prior
volumes
• Number of samples
(counting)
• Discrete point
estimates for
contours
• Particle path
dependencies
Slide 100
Slide 100 text
Sampling Error: Poisson Uncertainties
= ∼
? ? ?
Δ ln
Based on Skilling (2006)
and Keeton (2011)
“Distance” from prior to posterior
Slide 101
Slide 101 text
Sampling Error: Poisson Uncertainties
≡
Ω
ln
Based on Skilling (2006)
and Keeton (2011)
Kullback-Leibler divergence from
to “information gained”.
Slide 102
Slide 102 text
Sampling Error: Poisson Uncertainties
≡
Ω
ln
=
1
0
1
ℒ ln ℒ − ln
Based on Skilling (2006)
and Keeton (2011)
Kullback-Leibler divergence from
to “information gained”.
Slide 103
Slide 103 text
Sampling Error: Poisson Uncertainties
= ∼
Δ ln
Based on Skilling (2006)
and Keeton (2011)
Slide 104
Slide 104 text
Sampling Error: Poisson Uncertainties
= ∼
Δ ln
Based on Skilling (2006)
and Keeton (2011)
ln
Slide 105
Slide 105 text
Sampling Error: Poisson Uncertainties
= ∼
Δ ln
Based on Skilling (2006)
and Keeton (2011)
ln ∼ ln
Slide 106
Slide 106 text
Sampling Error: Poisson Uncertainties
= ∼
Δ ln
Based on Skilling (2006)
and Keeton (2011)
ln ∼ ln
∼ Δ ln 2
Slide 107
Slide 107 text
Sampling Error: Poisson Uncertainties
= ∼
Δ ln
Based on Skilling (2006)
and Keeton (2011)
ln ∼ ln
∼ Δ ln 2
∼ Δ ln
Slide 108
Slide 108 text
Sampling Error: Poisson Uncertainties
= ∼
Δ ln
Based on Skilling (2006)
and Keeton (2011)
ln ∼ ln
∼ Δ ln 2
∼
=1
+
Δ
Δ ln
= 1/
Slide 109
Slide 109 text
Sampling Error: Monte Carlo Noise
Formalism following Higson et al.
(2017) and Chopin and Robert (2010)
=
Ω
=
1
0
1
X ℒ
Slide 110
Slide 110 text
Sampling Error: Monte Carlo Noise
=
Ω
=
1
0
1
X ℒ
Formalism following Higson et al.
(2017) and Chopin and Robert (2010)
X =
ℒ = ℒ
Slide 111
Slide 111 text
Sampling Error: Monte Carlo Noise
=
Ω
=
1
0
1
X ℒ
Formalism following Higson et al.
(2017) and Chopin and Robert (2010)
X =
ℒ = ℒ
Slide 112
Slide 112 text
Sampling Error: Monte Carlo Noise
=
Ω
=
1
0
1
X ℒ
Formalism following Higson et al.
(2017) and Chopin and Robert (2010)
X =
ℒ = ℒ
Slide 113
Slide 113 text
Sampling Error: Monte Carlo Noise
Formalism following Higson et al.
(2017) and Chopin and Robert (2010)
≈
=1
+
=
Ω
=
1
0
1
X ℒ
Slide 114
Slide 114 text
Exploring Sampling Uncertainties
One run with K “live points”
= K runs with 1 live point!
ℒ
> ℒ−1
> ⋯
> ℒ2
> ℒ1
> 0
Slide 115
Slide 115 text
Exploring Sampling Uncertainties
One run with K “live points”
= K runs with 1 live point!
ℒ1
1 > ⋯ > ℒ2
1 > ℒ1
1 > 0
ℒ2
2 > ⋯ > ℒ2
2 > ℒ1
2 > 0
Slide 116
Slide 116 text
Exploring Sampling Uncertainties
One run with K “live points”
= K runs with 1 live point!
ℒ
∙ = ℒ
1 , ℒ
2 , …
Original run
“strand”
“strand”
Slide 117
Slide 117 text
Exploring Sampling Uncertainties
ℒ
∙ = ℒ
1 , ℒ
2 , …
Original run
“strand”
“strand”
Slide 118
Slide 118 text
Exploring Sampling Uncertainties
ℒ
∙ = ℒ
1 , ℒ
2 , …
Original run
“strand”
“strand”
ℒ
∙
′
= ℒ
1
′
, ℒ
2
′
, …
We would like to sample K paths from
the set of all possible paths P ℒ
, … .
However, we don’t have access to it.
Slide 119
Slide 119 text
Exploring Sampling Uncertainties
ℒ
∙ = ℒ
1 , ℒ
2 , …
Original run
“strand”
“strand”
We would like to sample K paths from
the set of all possible paths P ℒ
, … .
However, we don’t have access to it.
Use bootstrap estimator.
ℒ
∙
′
= ℒ
1 , ℒ
1 , ℒ
2 , …
Slide 120
Slide 120 text
Exploring Sampling Uncertainties
ℒ
∙ = ℒ
1 , ℒ
2 , …
Original run
“strand”
“strand”
′ ≈
=1
′
′
′ =
′
′
We would like to sample K paths from
the set of all possible paths P ℒ
, … .
However, we don’t have access to it.
Use bootstrap estimator.
Slide 121
Slide 121 text
Nested Sampling
In Practice
Slide 122
Slide 122 text
Nested Sampling In Practice
Higson et al. (2017)
arxiv:1704.03459
Slide 123
Slide 123 text
Method 0: Sampling from the Prior
Higson et al. (2017)
arxiv:1704.03459
Slide 124
Slide 124 text
Method 0: Sampling from the Prior
Higson et al. (2017)
arxiv:1704.03459
Sampling from the prior
becomes exponentially
more inefficient as time
goes on.
Slide 125
Slide 125 text
Method 1: Constrained Uniform Sampling
Feroz et al. (2009)
Proposal:
Bound the iso-likelihood contours in real
time and sample from the newly
constrained prior.
Slide 126
Slide 126 text
Method 1: Constrained Uniform Sampling
Feroz et al. (2009)
Issues:
• How to ensure bounds always
encompass iso-likelihood contours?
• How to generate flexible bounds?
Slide 127
Slide 127 text
Method 1: Constrained Uniform Sampling
Feroz et al. (2009)
Issues:
• How to ensure bounds always
encompass iso-likelihood contours?
• How to generate flexible bounds?
Bootstrapping.
Easier with uniform
(transformed) prior.
Method 2: “Evolving” Previous Samples
Proposal:
Generate independent samples subject to the likelihood
constraint by “evolving” copies of current live points.
Slide 130
Slide 130 text
Method 2: “Evolving” Previous Samples
Proposal:
Generate independent samples subject to the likelihood
constraint by “evolving” copies of current live points.
Slide 131
Slide 131 text
Method 2: “Evolving” Previous Samples
Proposal:
Generate independent samples subject to the likelihood
constraint by “evolving” copies of current live points.
• Random walks (i.e. MCMC)
• Slice sampling
• Random trajectories (i.e. HMC)
PolyChord
Slide 132
Slide 132 text
Method 2: “Evolving” Previous Samples
Issues:
• How to ensure samples are independent (thinning) and
properly distributed within likelihood constraint?
• How to generate efficient proposals?
• Random walks (i.e. MCMC)
• Slice sampling
• Random trajectories (i.e. HMC)
PolyChord
Slide 133
Slide 133 text
Example: Gaussian Shells
Feroz et al. (2013)
Slide 134
Slide 134 text
Example: Eggbox
Feroz et al. (2013)
Slide 135
Slide 135 text
Summary: (Static) Nested Sampling
Slide 136
Slide 136 text
Summary: (Static) Nested Sampling
1. Estimates the evidence .
Slide 137
Slide 137 text
Summary: (Static) Nested Sampling
1. Estimates the evidence .
2. Estimates the posterior .
Slide 138
Slide 138 text
Summary: (Static) Nested Sampling
1. Estimates the evidence .
2. Estimates the posterior .
3. Possesses well-defined stopping criteria.
Summary: (Static) Nested Sampling
1. Estimates the evidence .
2. Estimates the posterior .
3. Possesses well-defined stopping criteria.
4. Combining runs improves inference.
5. Sampling and statistical
uncertainties can be simulated
from a single run.
Slide 141
Slide 141 text
Dynamic Nested
Sampling
Slide 142
Slide 142 text
Dynamic Nested Sampling
Higson et al. (2017)
arxiv:1704.03459
Slide 143
Slide 143 text
Dynamic Nested Sampling
Higson et al. (2017)
arxiv:1704.03459
Slide 144
Slide 144 text
Dynamic Nested Sampling
Higson et al. (2017)
arxiv:1704.03459
Slide 145
Slide 145 text
Dynamic Nested Sampling
Higson et al. (2017)
arxiv:1704.03459
Slide 146
Slide 146 text
Dynamic Nested Sampling
Higson et al. (2017)
arxiv:1704.03459
Benefits of Dynamic Nested Sampling
• Can accommodate new “strands” within a
particular range of prior volumes without
changing overall statistical framework.
• Particles can be adaptively added until stopping
criteria are reached, allowing targeted
estimation.
Slide 155
Slide 155 text
Benefits of Dynamic Nested Sampling
• Can accommodate new “strands” within a
particular range of prior volumes without
changing overall statistical framework.
• Particles can be adaptively added until stopping
criteria are reached, allowing targeted
estimation.
Slide 156
Slide 156 text
Sampling Uncertainties (Static)
ℒ
∙ = ℒ
1 , ℒ
2 , …
Original run
“strand”
“strand”
“strand”
We would like to sample K paths from
the set of all possible paths P ℒ
, … .
However, we don’t have access to it.
Use bootstrap estimator.
ℒ
∙
′
= ℒ
1 , ℒ
1 , ℒ
2 , …
Slide 157
Slide 157 text
Sampling Uncertainties (Dynamic)
ℒ
∙ = ℒ
1 , ℒ
2 , …
Original run
ℒ
min
1 = −∞
(originated from the prior)
“strand”
ℒ
min
2 = 2
(originated interior to the prior)
Slide 158
Slide 158 text
Sampling Uncertainties (Dynamic)
ℒ
∙ = ℒ
1 , ℒ
2 , …
Original run
ℒ
min
1 = −∞
(originated from the prior)
“strand”
ℒ
min
2 = 2
(originated interior to the prior)
We would like to sample:
paths from P ℒ
, … and
paths from P ℒ
, … .
Slide 159
Slide 159 text
Sampling Uncertainties (Dynamic)
ℒ
∙ = ℒ
1 , ℒ
2 , …
Original run
ℒ
min
1 = −∞
(originated from the prior)
“strand”
ℒ
min
2 = 2
(originated interior to the prior)
We would like to sample:
paths from P ℒ
, … and
paths from P ℒ
, … .
Use stratified bootstrap estimator.
ℒ
∙
′
= ℒ
1 , ℒ
1 , ℒ
2 , …
Slide 160
Slide 160 text
Dynamic Nested Sampling
• Can accommodate new “strands” within a
particular range of prior volumes without
changing overall statistical framework.
• Particles can be adaptively added until stopping
criteria are reached, allowing targeted
estimation.
Slide 161
Slide 161 text
Dynamic Nested Sampling
• Can accommodate new “strands” within a
particular range of prior volumes without
changing overall statistical framework.
• Particles can be adaptively added until stopping
criteria are reached, allowing targeted
estimation.
How Many Samples is Enough?
• Ill-posed question: depends on application!
Slide 168
Slide 168 text
How Many Samples is Enough?
• In any sampling-based approach to estimating
with , how many samples are necessary?
Slide 169
Slide 169 text
How Many Samples is Enough?
• In any sampling-based approach to estimating
with , how many samples are necessary?
Assume general case: we want D-dimensional
and i densities to be “close”.
“True” posterior
constructed over
same “domain”.
Slide 170
Slide 170 text
How Many Samples is Enough?
• In any sampling-based approach to estimating
with , how many samples are necessary?
Assume general case: we want D-dimensional
and i densities to be “close”.
≡
Ω
ln
Slide 171
Slide 171 text
How Many Samples is Enough?
• In any sampling-based approach to estimating
with , how many samples are necessary?
Assume general case: we want D-dimensional
and i densities to be “close”.
=
=1
ln
Slide 172
Slide 172 text
How Many Samples is Enough?
• In any sampling-based approach to estimating
with , how many samples are necessary?
Assume general case: we want D-dimensional
and i densities to be “close”.
=
=1
ln
We want access to P
, but we don’t know .
Slide 173
Slide 173 text
How Many Samples is Enough?
• In any sampling-based approach to estimating
with , how many samples are necessary?
Assume general case: we want D-dimensional
and i densities to be “close”.
′ =
=1
′
ln ′
We want access to Pr
, but we don’t know .
Use bootstrap estimator.
Slide 174
Slide 174 text
How Many Samples is Enough?
• In any sampling-based approach to estimating
with , how many samples are necessary?
Assume general case: we want D-dimensional
and i densities to be “close”.
′ =
=1
′
ln ′
We want access to Pr
, but we don’t know .
Use bootstrap estimator.
Random
variable
Slide 175
Slide 175 text
How Many Samples is Enough?
• In any sampling-based approach to estimating
with , how many samples are necessary?
Assume general case: we want D-dimensional
and i densities to be “close”.
Possible stopping criterion:
fractional (%) variation in H.
We want access to Pr
, but we don’t know .
Use bootstrap estimator.
Slide 176
Slide 176 text
Dynamic Nested Sampling Summary
1. Can sample from multi-modal distributions.
2. Can simultaneously estimate the evidence and
posterior .
3. Combining independent runs improves inference
(“trivially parallelizable”).
4. Can simulate uncertainties (sampling and statistical) from
a single run.
5. Enables adaptive sample allocation during runtime using
arbitrary weight functions.
6. Possesses evidence/posterior-based stopping criteria.
Slide 177
Slide 177 text
Examples and
Applications
Slide 178
Slide 178 text
Dynamic Nested Sampling with dynesty
dynesty.readthedocs.io
• Pure Python.
• Easy to use.
• Modular.
• Open source.
• Parallelizable.
• Flexible bounding/sampling methods.
• Thorough documentation!
Slide 179
Slide 179 text
Example:
Slide 180
Slide 180 text
Example: Linear Regression (Posterior)
= {, , ln }
Slide 181
Slide 181 text
Example: Linear Regression (Posterior)
Corner Plot
Slide 182
Slide 182 text
Example: Linear Regression (Posterior)
Corner Plot
Trace Plot
Slide 183
Slide 183 text
Example: Multivariate Normal (Evidence)
= 1
, 2
, 3
Slide 184
Slide 184 text
Example: Multivariate Normal (Evidence)
“Summary” Plot
Slide 185
Slide 185 text
Example: Multivariate Normal (Errors)
Static
Dynamic
“Summary” Plot
Application:
• All results are preliminary but agree with results
from MCMC methods (derived using emcee).
• Samples allocated with 100% posterior weight,
automated stopping criterion (2% fractional error
in simulated KLD).
• dynesty was substantially (~3-6x) more efficient
at generating good samples than emcee, before
thinning.
Slide 193
Slide 193 text
Application: Modeling Galaxy SEDs
= ln ∗
, ln , 5
, 6
, 2 D=15
With: Joel Leja, Ben Johnson, Charlie Conroy
Slide 194
Slide 194 text
Application: Modeling Galaxy SEDs
With: Joel Leja, Ben Johnson, Charlie Conroy
Slide 195
Slide 195 text
Application: Modeling Galaxy SEDs
Fig: Joel Leja
With: Joel Leja, Ben Johnson, Charlie Conroy
Slide 196
Slide 196 text
Application: Supernovae Light Curves
Fig: Open Supernova Catalog (LSQ12dlf), James Guillochon
With: James Guillochon, Kaisey Mandel
=
, 4
, 3
, 3
, 2 D=12
Slide 197
Slide 197 text
Application: Supernovae Light Curves
With: James Guillochon, Kaisey Mandel