Exoplanet population inference: a tutorial

Exoplanet Population Inference A Tutorial Dan Foreman-Mackey CCA@Flatiron // dfm.io

Today I'll mostly talk about transiting exoplanets*. The methods can
apply more broadly . * this is what I know about and work on!

1 Exoplanet population inference

1 10 100 orbital period [days] 1 10 planet radius
[R ] data: NASA Exoplanet Archive

leteness model 2013; Farr et al. is shortcoming ch pipeline
and igura et al. . hortcoming by eteness of the 2014) through s. In this study, Kepler pipeline rive the planet Kepler planet other highlight the systematic ce rates with ) and Dong & ysis where we recalculate the input assump- Figure 1. Fractional completeness model for the host to Kepler-22b (KIC: 10593626) in the Q1-Q16 pipeline run using the analytic model described in Section 2. t 10 Burke et al. Burke, Christiansen et al. (2015)

Take these catalogs and get the physics of planet formation
and evolution.

That's hard .

1 10 100 orbital period [days] 1 10 planet radius
[R ] data: NASA Exoplanet Archive

Fulton & Petigura (2018) 8. Planets with g grazing transit
covariances w darkening duri After applying these Where possible, properties to the Ke radius and temper parameters. We cou stellar population b directed specificall population. After fil We calculated pla efficiency methodolo the detection sensit recovery tests perfo K02403.01 17.98 K00988.01 60.03 Note. This table contains filters described in Sectio (This table is available in Figure 5. The distribution of close-in planet sizes. The top panel shows the distribution from Fulton et al. (2017) and the bottom panel is the updated distribution from this work. The solid line shows the number of planets per star with orbital periods less than 100days as a function of planet size. A deep

2 What is an occurrence rate?

1 The expected number of planets per star.

2 The fraction of stars with planets.

3 The expected number of planets per star per unit
planet property .

4 etc.

None of these deﬁnitions is inherently better than the others.

But. They are all different .

They have different units .

They all depend on a speciﬁc (often unstated) deﬁnition of
"planets" .

So. It can be hard to compare and understand how
they relate.

Them: * "The occurrence rate is 10%." Y'all: "what does
it all mean?!?1?" * including me and others in the room

covariances w darkening duri After applying these Where possible, properties to the Ke radius and temper parameters. We cou stellar population b directed specificall population. After fil We calculated pla efficiency methodolo the detection sensit recovery tests perfo K02403.01 17.98 K00988.01 60.03 Note. This table contains filters described in Sectio (This table is available in Figure 5. The distribution of close-in planet sizes. The top panel shows the distribution from Fulton et al. (2017) and the bottom panel is the updated distribution from this work. The solid line shows the number of planets per star with orbital periods less than 100days as a function of planet size. A deep

covariances w darkening duri After applying these Where possible, properties to the Ke radius and temper parameters. We cou stellar population b directed specificall population. After fil We calculated pla efficiency methodolo the detection sensit recovery tests perfo K02403.01 17.98 K00988.01 60.03 Note. This table contains filters described in Sectio (This table is available in Figure 5. The distribution of close-in planet sizes. The top panel shows the distribution from Fulton et al. (2017) and the bottom panel is the updated distribution from this work. The solid line shows the number of planets per star with orbital periods less than 100days as a function of planet size. A deep what do these numbers mean?

covariances w darkening duri After applying these Where possible, properties to the Ke radius and temper parameters. We cou stellar population b directed specificall population. After fil We calculated pla efficiency methodolo the detection sensit recovery tests perfo K02403.01 17.98 K00988.01 60.03 Note. This table contains filters described in Sectio (This table is available in Figure 5. The distribution of close-in planet sizes. The top panel shows the distribution from Fulton et al. (2017) and the bottom panel is the updated distribution from this work. The solid line shows the number of planets per star with orbital periods less than 100days as a function of planet size. A deep what do these numbers mean? The expected number of planets per star with a period in the range 0–100 days and radius in the given bin .

Simulations github.com/dfm/exostar19 expected number of planets per star

3 How to estimate an occurrence rate?

Inverse detection efﬁciency Probabilistic modeling Approximate Bayesian Computation 1 2
3

1 Inverse detection efﬁciency Nexpect = 1 Ntot N X
j=1 1 Pdet(xj) Note: don't do this!

2 Probabilistic modeling Nexpect = arg maxNexpect p(Nobs, {xj }
| Nexpect, Ntot)

3 Approximate Bayesian Computation

3

3 ≈ =

P(qj ) true number of planets nj, xj observed number
of planets the properties of the planets and the star want have

P(nj | xj , qj ) observed number of planets
true number of planets the properties of the planets and the star

Start with either zero or one planet(s).

There are four options.

value of P(nj | xj , qj ) 1 1–Pdet
(xj ) 0 Pdet (xj ) qj = 0 1 true number of planets nj =0 1 observed number of planets

But. We don't know the true number of planets.

Marginalize!

P(nj | xj) = X qj 2{0, 1} P(qj) P(nj
| xj, qj) = Q P(nj | xj, qj= 1) + (1 Q) P(nj | xj, qj= 0)

P(nj | xj) = X qj 2{0, 1} P(qj) P(nj
| xj, qj) = Q P(nj | xj, qj= 1) + (1 Q) P(nj | xj, qj= 0) this is the parameter that we want to ﬁt for!

But. We don't know the properties of the unobserved planets
.

Marginalize!

P(nj = 1) = p(xj) P(nj = 1 | xj)
= p(xj) Q P(nj = 1 | xj, qj= 1) P(nj = 0) = Z p(xj) P(nj = 0 | xj) dxj = 1 Q Z p(xj) P(nj = 1 | xj, qj= 1) dxj = 1 Q P0 systems with no planets systems with detected planets

P(nj = 1) = p(xj) P(nj = 1 | xj)
= p(xj) Q P(nj = 1 | xj, qj= 1) P(nj = 0) = Z p(xj) P(nj = 0 | xj) dxj = 1 Q Z p(xj) P(nj = 1 | xj, qj= 1) dxj = 1 Q P0 detection probability systems with no planets systems with detected planets

Put it all together. An exercise for the reader…

Q = N1 N0 + N1 1 P0 6= 1
N0 + N1 N1 X j=1 1 Pj the occurrence rate the fraction of stars with observed planets

Q = N1 N0 + N1 1 P0 6= 1
N0 + N1 N1 X j=1 1 Pj P0 = Z p(xj) P(nj = 1 | xj, qj= 1) dxj the detection probability averaged over the distribution of planet and stellar properties the occurrence rate the fraction of stars with observed planets

see: dfm.io/posts/histogram1

truth: 50 inverse-detection-efﬁciency gives: 28.5 ± 5.5 see: dfm.io/posts/histogram1

truth: 50 inverse-detection-efﬁciency gives: 28.5 ± 5.5 maximum-likelihood gives: 54.0
± 10.4 see: dfm.io/posts/histogram1

Inverse detection efﬁciency is not the right estimator.

Instead, take the fraction of detections and divide by the
average detection efﬁciency*. * averaged over the correct distribution for all planet and star properties

The key ingredient is the detection efﬁciency model.

leteness model 2013; Farr et al. is shortcoming ch pipeline
and igura et al. . hortcoming by eteness of the 2014) through s. In this study, Kepler pipeline rive the planet Kepler planet other highlight the systematic ce rates with ) and Dong & ysis where we recalculate the input assump- Figure 1. Fractional completeness model for the host to Kepler-22b (KIC: 10593626) in the Q1-Q16 pipeline run using the analytic model described in Section 2. t 10 Burke et al. Burke, Christiansen et al. (2015)

Remember : an occurrence rate depends on a lot of
decisions!

Stellar sample Range of planet parameters Units Planet multiplicity 1
2 3 4

4 Complications

Multiplicity Uncertainties False positives Heterogeneous catalogs 1 2 3 (planetary
and stellar) 4

You end up needing to do an integral over all
the properties of all the planets and false positives that you didn't observe .

1 Mathematica™ can't do that integral.

2 Eric Agol can't do that integral.

3 MCMC can't do that integral*. * in ﬁnite time.

This is where you use approximate Bayesian computation (ABC).

This is where you use approximate Bayesian computation (ABC). likelihood-free
inference.

Likelihood-free inference is a method for doing rigorous inference with
stochastic models .

If you can simulate it then you can do inference.
a realistic catalog The promise of "likelihood-free inference".

PLANET OCCURRENCE RATES 11 Figure 2. Inferred occurrence rates for
Kepler’s DR25 planet candidates associated with high-quality FGK target stars. These rares are based on a combined detection and vetting efficiency model that was fit to flux-level planet injection tests. The numerical values of the occurrence Hsu et al. (2019)

There's still lots to do!

EPOS; Mulders et al. (2018) no additional s indicate the
Figure 10. Comparison of simulated planets for the example model (blue) with detected planets (orange). The comparison region (black box) excludes hot

5 Take homes

An occurrence rate needs to come with a lot of
metadata.

Comparing occurrence rates: Check the units . Check the parameter
ranges .

Don't sum the inverse detection probabilities for your planets! *
a more reliable estimator is just as easy to compute!

If you're using a method that seems intuitive , make
sure the math checks out !

Likelihood-free inference seems like a promising way forward. * a.k.a.
Approximate Bayesian Computation (ABC)

It's over.

Extras.

p({nj }, {xj } | Q) = [1 Q P0]N0
2 4 N1 Y j=1 Q p(xj) P(nj = 1 | xj, qj= 1) 3 5

log p({nj }, {xj } | Q) = N0 log
(1 Q P0) + N1 log Q + constant

log p({nj }, {xj } | Q) = N0 log
(1 Q P0) + N1 log Q + constant Q = N1 N0 + N1 1 P0 6= 1 N0 + N1 N1 X j=1 1 Pj

Simulations github.com/dfm/exostar19 "truth" fraction of stars with planets expected number
of planets per star

Note: this is preliminary & really just a toy… assuming:
no mutual inclination only geometric transit probability 0.5 < RP /REarth < 8; 10 < a/Rstar < 30 Kepler data: github.com/dfm/exostar19

0.5 < RP /REarth < 8; 10 < a/Rstar <
30 Kepler data: github.com/dfm/exostar19 Note: this is preliminary & really just a toy… assuming: no mutual inclination only geometric transit probability

Exoplanet population inference: a tutorial

Exoplanet population inference: a tutorial

More Decks by Dan Foreman-Mackey

Other Decks in Science

Featured

Transcript