Exoplanet population inference: a tutorial

Slide 1

Slide 1 text

Exoplanet Population Inference A Tutorial Dan Foreman-Mackey CCA@Flatiron // dfm.io

Slide 2

Slide 2 text

Today I'll mostly talk about transiting exoplanets*. The methods can apply more broadly . * this is what I know about and work on!

Slide 3

Slide 3 text

1 Exoplanet population inference

Slide 4

Slide 4 text

1 10 100 orbital period [days] 1 10 planet radius [R ] data: NASA Exoplanet Archive

Slide 5

Slide 5 text

leteness model 2013; Farr et al. is shortcoming ch pipeline and igura et al. . hortcoming by eteness of the 2014) through s. In this study, Kepler pipeline rive the planet Kepler planet other highlight the systematic ce rates with ) and Dong & ysis where we recalculate the input assump- Figure 1. Fractional completeness model for the host to Kepler-22b (KIC: 10593626) in the Q1-Q16 pipeline run using the analytic model described in Section 2. t 10 Burke et al. Burke, Christiansen et al. (2015)

Slide 6

Slide 6 text

Take these catalogs and get the physics of planet formation and evolution.

Slide 7

Slide 7 text

That's hard .

Slide 8

Slide 8 text

1 10 100 orbital period [days] 1 10 planet radius [R ] data: NASA Exoplanet Archive

Slide 9

Slide 9 text

Fulton & Petigura (2018) 8. Planets with g grazing transit covariances w darkening duri After applying these Where possible, properties to the Ke radius and temper parameters. We cou stellar population b directed specificall population. After fil We calculated pla efficiency methodolo the detection sensit recovery tests perfo K02403.01 17.98 K00988.01 60.03 Note. This table contains filters described in Sectio (This table is available in Figure 5. The distribution of close-in planet sizes. The top panel shows the distribution from Fulton et al. (2017) and the bottom panel is the updated distribution from this work. The solid line shows the number of planets per star with orbital periods less than 100days as a function of planet size. A deep

Slide 10

Slide 10 text

2 What is an occurrence rate?

Slide 11

Slide 11 text

1 The expected number of planets per star.

Slide 12

Slide 12 text

2 The fraction of stars with planets.

Slide 13

Slide 13 text

3 The expected number of planets per star per unit planet property .

Slide 14

Slide 14 text

4 etc.

Slide 15

Slide 15 text

None of these deﬁnitions is inherently better than the others.

Slide 16

Slide 16 text

But. They are all different .

Slide 17

Slide 17 text

They have different units .

Slide 18

Slide 18 text

They all depend on a speciﬁc (often unstated) deﬁnition of "planets" .

Slide 19

Slide 19 text

So. It can be hard to compare and understand how they relate.

Slide 20

Slide 20 text

Them: * "The occurrence rate is 10%." Y'all: "what does it all mean?!?1?" * including me and others in the room

Slide 21

Slide 21 text

Them: * "The occurrence rate is 10%." Y'all: "what does it all mean?!?1?" * including me and others in the room

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Simulations github.com/dfm/exostar19 expected number of planets per star

Slide 26

Slide 26 text

3 How to estimate an occurrence rate?

Slide 27

Slide 27 text

Inverse detection efﬁciency Probabilistic modeling Approximate Bayesian Computation 1 2 3

Slide 28

Slide 28 text

1 Inverse detection efﬁciency Nexpect = 1 Ntot N X j=1 1 Pdet(xj) Note: don't do this!

Slide 29

Slide 29 text

2 Probabilistic modeling Nexpect = arg maxNexpect p(Nobs, {xj } | Nexpect, Ntot)

Slide 30

Slide 30 text

3 Approximate Bayesian Computation

Slide 31

Slide 31 text

3 Approximate Bayesian Computation

Slide 32

Slide 32 text

Inverse detection efﬁciency Probabilistic modeling Approximate Bayesian Computation 1 2 3

Slide 33

Slide 33 text

Inverse detection efﬁciency Probabilistic modeling Approximate Bayesian Computation 1 2 3 ≈ =

Slide 34

Slide 34 text

Inverse detection efﬁciency Probabilistic modeling Approximate Bayesian Computation 1 2 3 ≈ =

Slide 35

Slide 35 text

P(qj ) true number of planets nj, xj observed number of planets the properties of the planets and the star want have

Slide 36

Slide 36 text

P(nj | xj , qj ) observed number of planets true number of planets the properties of the planets and the star

Slide 37

Slide 37 text

Start with either zero or one planet(s).

Slide 38

Slide 38 text

There are four options.

Slide 39

Slide 39 text

value of P(nj | xj , qj ) 1 1–Pdet (xj ) 0 Pdet (xj ) qj = 0 1 true number of planets nj =0 1 observed number of planets

Slide 40

Slide 40 text

But. We don't know the true number of planets.

Slide 41

Slide 41 text

Marginalize!

Slide 42

Slide 42 text

P(nj | xj) = X qj 2{0, 1} P(qj) P(nj | xj, qj) = Q P(nj | xj, qj= 1) + (1 Q) P(nj | xj, qj= 0)

Slide 43

Slide 43 text

P(nj | xj) = X qj 2{0, 1} P(qj) P(nj | xj, qj) = Q P(nj | xj, qj= 1) + (1 Q) P(nj | xj, qj= 0)

Slide 44

Slide 44 text

P(nj | xj) = X qj 2{0, 1} P(qj) P(nj | xj, qj) = Q P(nj | xj, qj= 1) + (1 Q) P(nj | xj, qj= 0) this is the parameter that we want to ﬁt for!

Slide 45

Slide 45 text

But. We don't know the properties of the unobserved planets .

Slide 46

Slide 46 text

Marginalize!

Slide 47

Slide 47 text

P(nj = 1) = p(xj) P(nj = 1 | xj) = p(xj) Q P(nj = 1 | xj, qj= 1) P(nj = 0) = Z p(xj) P(nj = 0 | xj) dxj = 1 Q Z p(xj) P(nj = 1 | xj, qj= 1) dxj = 1 Q P0 systems with no planets systems with detected planets

Slide 48

Slide 48 text

P(nj = 1) = p(xj) P(nj = 1 | xj) = p(xj) Q P(nj = 1 | xj, qj= 1) P(nj = 0) = Z p(xj) P(nj = 0 | xj) dxj = 1 Q Z p(xj) P(nj = 1 | xj, qj= 1) dxj = 1 Q P0 detection probability systems with no planets systems with detected planets

Slide 49

Slide 49 text

Put it all together. An exercise for the reader…

Slide 50

Slide 50 text

Q = N1 N0 + N1 1 P0 6= 1 N0 + N1 N1 X j=1 1 Pj the occurrence rate the fraction of stars with observed planets

Slide 51

Slide 51 text

Q = N1 N0 + N1 1 P0 6= 1 N0 + N1 N1 X j=1 1 Pj P0 = Z p(xj) P(nj = 1 | xj, qj= 1) dxj the detection probability averaged over the distribution of planet and stellar properties the occurrence rate the fraction of stars with observed planets

Slide 52

Slide 52 text

Slide 53

Slide 53 text

see: dfm.io/posts/histogram1

Slide 54

Slide 54 text

truth: 50 inverse-detection-efﬁciency gives: 28.5 ± 5.5 see: dfm.io/posts/histogram1

Slide 55

Slide 55 text

truth: 50 inverse-detection-efﬁciency gives: 28.5 ± 5.5 maximum-likelihood gives: 54.0 ± 10.4 see: dfm.io/posts/histogram1

Slide 56

Slide 56 text

Inverse detection efﬁciency is not the right estimator.

Slide 57

Slide 57 text

Instead, take the fraction of detections and divide by the average detection efﬁciency*. * averaged over the correct distribution for all planet and star properties

Slide 58

Slide 58 text

The key ingredient is the detection efﬁciency model.

Slide 59

Slide 59 text

Slide 60

Slide 60 text

Remember : an occurrence rate depends on a lot of decisions!

Slide 61

Slide 61 text

Stellar sample Range of planet parameters Units Planet multiplicity 1 2 3 4

Slide 62

Slide 62 text

4 Complications

Slide 63

Slide 63 text

Multiplicity Uncertainties False positives Heterogeneous catalogs 1 2 3 (planetary and stellar) 4

Slide 64

Slide 64 text

You end up needing to do an integral over all the properties of all the planets and false positives that you didn't observe .

Slide 65

Slide 65 text

1 Mathematica™ can't do that integral.

Slide 66

Slide 66 text

2 Eric Agol can't do that integral.

Slide 67

Slide 67 text

3 MCMC can't do that integral*. * in ﬁnite time.

Slide 68

Slide 68 text

This is where you use approximate Bayesian computation (ABC).

Slide 69

Slide 69 text

This is where you use approximate Bayesian computation (ABC). likelihood-free inference.

Slide 70

Slide 70 text

Likelihood-free inference is a method for doing rigorous inference with stochastic models .

Slide 71

Slide 71 text

If you can simulate it then you can do inference. a realistic catalog The promise of "likelihood-free inference".

Slide 72

Slide 72 text

PLANET OCCURRENCE RATES 11 Figure 2. Inferred occurrence rates for Kepler’s DR25 planet candidates associated with high-quality FGK target stars. These rares are based on a combined detection and vetting efficiency model that was fit to flux-level planet injection tests. The numerical values of the occurrence Hsu et al. (2019)

Slide 73

Slide 73 text

There's still lots to do!

Slide 74

Slide 74 text

EPOS; Mulders et al. (2018) no additional s indicate the Figure 10. Comparison of simulated planets for the example model (blue) with detected planets (orange). The comparison region (black box) excludes hot

Slide 75

Slide 75 text

Slide 76

Slide 76 text

5 Take homes

Slide 77

Slide 77 text

An occurrence rate needs to come with a lot of metadata.

Slide 78

Slide 78 text

Comparing occurrence rates: Check the units . Check the parameter ranges .

Slide 79

Slide 79 text

Don't sum the inverse detection probabilities for your planets! * a more reliable estimator is just as easy to compute!

Slide 80

Slide 80 text

If you're using a method that seems intuitive , make sure the math checks out !

Slide 81

Slide 81 text

Likelihood-free inference seems like a promising way forward. * a.k.a. Approximate Bayesian Computation (ABC)

Slide 82

Slide 82 text

It's over.

Slide 83

Slide 83 text

Extras.

Slide 84

Slide 84 text

p({nj }, {xj } | Q) = [1 Q P0]N0 2 4 N1 Y j=1 Q p(xj) P(nj = 1 | xj, qj= 1) 3 5

Slide 85

Slide 85 text

log p({nj }, {xj } | Q) = N0 log (1 Q P0) + N1 log Q + constant

Slide 86

Slide 86 text

log p({nj }, {xj } | Q) = N0 log (1 Q P0) + N1 log Q + constant Q = N1 N0 + N1 1 P0 6= 1 N0 + N1 N1 X j=1 1 Pj

Slide 87

Slide 87 text

Simulations github.com/dfm/exostar19 "truth" fraction of stars with planets expected number of planets per star

Slide 88

Slide 88 text

Note: this is preliminary & really just a toy… assuming: no mutual inclination only geometric transit probability 0.5 < RP /REarth < 8; 10 < a/Rstar < 30 Kepler data: github.com/dfm/exostar19

Slide 89

Slide 89 text

0.5 < RP /REarth < 8; 10 < a/Rstar < 30 Kepler data: github.com/dfm/exostar19 Note: this is preliminary & really just a toy… assuming: no mutual inclination only geometric transit probability