Slide 1

Slide 1 text

Dan Foreman-Mackey Sagan Fellow / University of Washington @exoplaneteer / dfm.io / github.com/dfm How to find a transiting exoplanet data-driven discovery in the astronomical time domain

Slide 2

Slide 2 text

Dan Foreman-Mackey Sagan Fellow / University of Washington @exoplaneteer / dfm.io / github.com/dfm Noise models and some more noise models

Slide 3

Slide 3 text

Let me introduce myself…

Slide 4

Slide 4 text

I build tools. and when I say "tools" I actually mean "software"…

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Exoplanets

Slide 8

Slide 8 text

How We Find Exoplanets

Slide 9

Slide 9 text

transit radial velocity direct imaging microlensing timing 2712 692 52 40 25 Data Source: The Open Exoplanet Catalogue

Slide 10

Slide 10 text

Data Source: The Open Exoplanet Catalogue 2000 2005 2010 2015 year 0 500 1000 confirmed exoplanets transit RV microlensing direct imaging timing

Slide 11

Slide 11 text

Kepler Credit: NASA

Slide 12

Slide 12 text

Data Source: The NASA Exoplanet Archive; Kepler DR25; 5/13/2017 1 10 100 1000 orbital period [days] 1 10 planet radius [R ]

Slide 13

Slide 13 text

So what?

Slide 14

Slide 14 text

The Population of Exoplanets

Slide 15

Slide 15 text

The population of exoplanets 1 occurrence rates 2 physics

Slide 16

Slide 16 text

Burke et al. (2015) the data, rises toward small planets with a = -1.8 2 and has a break near the edge of the parameter space. Given the low numbers of observed planet candidates in the smallest planet bins, the full posterior allowed behavior (1σ orange region ; 3σ Figure 6) the occurrence rates in the smallest Rp bins. (b) The more complicated model ensures the ability to adapt to variations in the PLDF in the sensitivity analysis of Section 6.2. (c) Previous work on Kepler planet occurrence rates indicated a break in the planet population for 1 2.0 Rp  2.8 Å R (Fressin et al. 2013; Petigura et al. 2013a, 2013b; Silburt et al. 2015). (d) Finally, extending this work to a larger parameter space and for alternative target selection samples, such as the Kepler M dwarf sample where a sharp break at Rp ∼ 2.5 Å R is observed (Dressing & Charbonneau 2013; Burke et al. 2015), the double power law in Rp is strongly (BIC >10) warranted. Symptomatic of the weak evidence for a broken power law model over the ⩽ 0.75 Rp ⩽ 2.5 Å R range, Rbrk is not constrained within the prior Rp limits of the parameter space. When Rbrk is near the lower and upper Rp limits, a1 and a2 also become poorly constrained, respectively. To provide a more meaningful constraint on the average power law behavior for Rp in the double power law PLDF model, we introduce aavg , which we set to a a = avg 1 if ⩾ R R brk mid and a a = avg 2 otherwise, where Rmid is the midpoint between the upper and lower limits of Rp . We find a = -1.54 0.5 avg and b = -0.68 0.17 for our baseline result. We use aavg as a summary statistic for the model parameters only to enable a simpler comparison of our results to independent analyses of planet occurrence rates and to approximate the behavior for the power law Rp dependence if we had used the simpler single power law model. The results for a single power law model in both Rp and P orb are equivalent to the results for the double Figure 7. Same as Figure 6, but marginalized over 0.75 < Rp < 2.5 Å R and bins of dP orb = 31.25 days. Figure 8. Shows the underlying planet occurrence rate model. Marginalized over 50 < P orb < 300 days and bins of dRp =0.25 Å R planet occurrence rates for the model parameters that maximize the likelihood (white dash line). Posterior distribution for the underlying planet occurrence rate for the median (blue solid line), 1σ region (orange region), and 3σ region (blue region). An approximate PLDF based upon results from Petigura et al. (2013a) for comparison (dash dot line). Figure 9. Same as Figure 8, but marginalized over 0.75 < Rp < 2.5 Å R and bins of dP orb =31.25 days. Figure 6) the occurrence rates in the smallest Rp bins. (b) The more complicated model ensures the ability to adapt to variations in the PLDF in the sensitivity analysis of Section 6.2. (c) Previous work on Kepler planet occurrence rates indicated a break in the planet population for 1 2.0 Rp  2.8 Å R (Fressin et al. 2013; Petigura et al. 2013a, 2013b; Silburt et al. 2015). (d) Finally, extending this work to a larger parameter space and for alternative target selection samples, such as the Kepler M gure 7. Same as Figure 6, but marginalized over 0.75 < Rp < 2.5 Å R and bins dP orb = 31.25 days. Figure 9. Same as Figure 8, but marginalized over 0.75 < Rp < 2.5 Å R and bins of dP orb =31.25 days. he Astrophysical Journal, 809:8 (19pp), 2015 August 10 Burke et al.

Slide 17

Slide 17 text

Kepler and the Transit Method (the spacecraft) 

Slide 18

Slide 18 text

Credit: NASA/European Space Agency

Slide 19

Slide 19 text

Jupiter Credit: NASA/European Space Agency

Slide 20

Slide 20 text

Jupiter Earth Credit: NASA/European Space Agency

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

1.0 0.5 0.0 0.5 1.0 time since transit [days] 100 50 0 relative brightness [ppm]

Slide 23

Slide 23 text

…but this is the real world. A few problems: 1 Timing 2 Geometry 3 Spacecraft motion 4 Intrinsic brightness variation

Slide 24

Slide 24 text

…but this is the real world. A few problems: 1 Timing 2 Geometry 3 Spacecraft motion 4 Intrinsic brightness variation transit probability

Slide 25

Slide 25 text

…but this is the real world. A few problems: 1 Timing 2 Geometry 3 Spacecraft motion 4 Intrinsic brightness variation transit probability noise!

Slide 26

Slide 26 text

Credit: NASA

Slide 27

Slide 27 text

Credit: NASA 190,000 stars for 4 years at 30 minute cadence with 10-3 pixel pointing precision

Slide 28

Slide 28 text

Data Source: The NASA Exoplanet Archive; Kepler DR25; 5/13/2017 1 10 100 1000 orbital period [days] 1 10 planet radius [R ]

Slide 29

Slide 29 text

1 Kepler (2009) 2 K2 (2014) 3 TESS (2018) 4 PLATO (2025)

Slide 30

Slide 30 text

Population Inference

Slide 31

Slide 31 text

Ingredients 1 Systematic target selection & catalog of stellar properties 2 Systematic catalog of planets 3 Quantified completeness & reliability 4 False positive rates & other effects (e.g. multiplicity)

Slide 32

Slide 32 text

Data Source: The NASA Exoplanet Archive; Kepler DR25; 5/13/2017 1 10 100 1000 orbital period [days] 1 10 planet radius [R ]

Slide 33

Slide 33 text

Burke, et al. (2015) model et al. ming e and al. g by f the ough tudy, peline planet planet hlight matic with ng & e we e the ump- icity, Figure 1. Fractional completeness model for the host to Kepler-22b (KIC: 10593626) in the Q1-Q16 pipeline run using the analytic model described in Section 2. Burke et al.

Slide 34

Slide 34 text

We need 1 Fully automated methods for planet discovery 2 Rigorous methods for population inference

Slide 35

Slide 35 text

How to Find a Transiting Exoplanet

Slide 36

Slide 36 text

Science. physics data

Slide 37

Slide 37 text

Science. physics data a model

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

star

Slide 43

Slide 43 text

spacecraft star

Slide 44

Slide 44 text

detector spacecraft star

Slide 45

Slide 45 text

detector spacecraft star planet?

Slide 46

Slide 46 text

+ planet star spacecraft detector observation + + =

Slide 47

Slide 47 text

+ planet star spacecraft detector observation + + = PHYSICS

Slide 48

Slide 48 text

+ planet star spacecraft detector observation + + = PHYSICS ????

Slide 49

Slide 49 text

The way we draw transits…

Slide 50

Slide 50 text

…and the way we should draw transits interesting boring boring

Slide 51

Slide 51 text

+ planet star spacecraft detector observation + + = PHYSICS DATA-DRIVEN MODELS

Slide 52

Slide 52 text

+ planet star spacecraft detector observation + + = PHYSICS DATA-DRIVEN MODELS (Gaussian Process)

Slide 53

Slide 53 text

How to find a transiting exoplanet 1 Fit & remove data-driven noise model 2 Matched filter grid search for candidate signals 3 Vet candidates to remove false alarms

Slide 54

Slide 54 text

Scalable Methods An aside...

Slide 55

Slide 55 text

Medium data; big questions… 1 Kepler 2 K2 3 TESS 190,000 stars 60,000 obs. per star 250,000 stars 4,000 obs. per star 500,000 stars 20,000 obs. per star approximately…

Slide 56

Slide 56 text

Scaling of Gaussian Processes O(N3) Cholesky factorization

Slide 57

Slide 57 text

Scaling of Gaussian Processes O(N3) Cholesky factorization O ( N log 2 N ) Approximate methods Ambikasaran, DFM, et al. (2016); arXiv:1403.6015

Slide 58

Slide 58 text

Scaling of Gaussian Processes O(N3) Cholesky factorization O ( N log 2 N ) Approximate methods Ambikasaran, DFM, et al. (2016); arXiv:1403.6015 O(N) Exploiting structure of specific 1D kernels DFM, et al. (submitted); arXiv:1703.09710

Slide 59

Slide 59 text

DFM, et al. (submitted); arXiv:1703.09710 102 103 104 105 number of data points [N] 10 5 10 4 10 3 10 2 10 1 100 computational cost [seconds] 1 2 4 8 16 32 64 128 256 direct O(N) 100 numb github.com/dfm/celerite

Slide 60

Slide 60 text

Pause… time for some examples!

Slide 61

Slide 61 text

The Frequency of Jupiter Analogs 1

Slide 62

Slide 62 text

Tim Morton (Princeton) David Hogg (NYU) Eric Agol (UW) Bernhard Schölkopf (MPIS) in collaboration with… DFM, et al. (2016) arXiv:1607.08237

Slide 63

Slide 63 text

1 10 100 orbital period [days] 1 10 planet radius [R ] Data Source: The NASA Exoplanet Archive

Slide 64

Slide 64 text

Data Source: The NASA Exoplanet Archive 1 10 100 orbital period [days] 1 10 planet radius [R ]

Slide 65

Slide 65 text

Data Source: The NASA Exoplanet Archive 1 10 100 1000 10000 orbital period [days] 1 10 planet radius [R ]

Slide 66

Slide 66 text

Data Source: The NASA Exoplanet Archive 1 10 100 1000 10000 orbital period [days] 1 10 planet radius [R ]

Slide 67

Slide 67 text

Why Kepler? Radial velocity, microlensing, etc. better suited…

Slide 68

Slide 68 text

1 Systematic target selection & catalog of stellar properties 2 Systematic catalog of planets 3 Quantified completeness & reliability 4 False positive rates & other effects (e.g. multiplicity)

Slide 69

Slide 69 text

Data Source: The NASA Exoplanet Archive 1 10 100 1000 10000 orbital period [days] 1 10 planet radius [R ]

Slide 70

Slide 70 text

DFM et al. (2016); arXiv:1607.08237 1 10 100 1000 10000 orbital period [days] 1 10 planet radius [R ] Data Source: The NASA Exoplanet Archive

Slide 71

Slide 71 text

How to find a transiting exoplanet 1 Fit & remove data-driven noise model 2 Matched filter grid search for candidate signals 3 Vet candidates to remove false alarms

Slide 72

Slide 72 text

+ planet star spacecraft detector observation + + = PHYSICS GAUSSIAN PROCESS CAUSAL MODEL (PCA) PHOTON NOISE

Slide 73

Slide 73 text

How to find a transiting exoplanet 1 Fit & remove data-driven noise model 2 Matched filter grid search for candidate signals 3 Vet candidates to remove false alarms

Slide 74

Slide 74 text

DFM, et al. (2016) 40 20 0 20 40 hours since event (a) variability KIC 7220674 40 20 0 20 40 hours since event (b) step KIC 8631697 40 20 0 20 40 hours since event (c) box KIC 5521451 40 20 0 20 40 hours since event (d) transit KIC 8505215

Slide 75

Slide 75 text

12 Foreman-Mackey, Hogg, Morton, et al. 0.50 0.25 0.00 10321319 1.2 0.6 0.0 10287723 1.6 0.8 0.0 8505215 0.8 0.0 6551440 0.8 0.0 8738735 3 2 1 0 8800954 4 2 0 10187159 4 2 0 3218908 3.0 1.5 0.0 4754460 5.0 2.5 0.0 8410697 4 2 0 10842718 8 4 0 11709124 16 8 0 3239945 4 2 0 8426957 50 25 0 9306307 80 40 0 10602068 Figure 3. Sections of PDC light curve centered on each candidate (black) with the posterior-median transit model over-plotted (orange). Candidates with two transits are folded on the posterior-median DFM, et al. (2016)

Slide 76

Slide 76 text

1 Systematic target selection & catalog of stellar properties 2 Systematic catalog of planets 3 Quantified completeness & reliability 4 False positive rates & other effects (e.g. multiplicity)

Slide 77

Slide 77 text

nuisance boring boring

Slide 78

Slide 78 text

DFM, et al. (2016) 3 5 10 20 period [years] 0.2 0.5 1.0 2.0 RP /RJ 0.048 0.211 0.499 0.669 0.727 0.710 0.635 0.046 0.194 0.468 0.616 0.657 0.630 0.569 0.043 0.193 0.460 0.605 0.623 0.591 0.520 0.038 0.174 0.433 0.529 0.529 0.492 0.427 0.0 0.3 0.6 0.0 0.3 0.6

Slide 79

Slide 79 text

DFM, et al. (2016) 2.00 ± 0.72 planets per G/K- dwarf occurrence rate in range: 2 – 25 years, 0.1 – 1 RJ

Slide 80

Slide 80 text

EVEREST: A Noise Model for K2 2

Slide 81

Slide 81 text

Credit: NASA R.I.P. Kepler

Slide 82

Slide 82 text

cbna Flickr user Aamir Choudhry K2

Slide 83

Slide 83 text

https://keplerscience.arc.nasa.gov/k2-fields.html

Slide 84

Slide 84 text

Adapted from a similar figure by Ian Crossfield baseline number of targets TESS  K2  Kepler

Slide 85

Slide 85 text

3.4 3.6 3.8 4.0 log10 Te↵ 0 2 4 log10 g Kepler 3.4 3.6 3.8 4.0 log10 Te↵ K2 Data Source: The NASA Exoplanet Archive; 5/13/2017

Slide 86

Slide 86 text

No content

Slide 87

Slide 87 text

No content

Slide 88

Slide 88 text

4000 2000 0 2000 4000 raw: 301 ppm EPIC 201374602; Kp = 11.5 mag 10 20 30 40 50 60 70 80 time [BJD - 2456808] 400 0 400 residuals: 35 ppm relative brightness [ppm] 4000 2000 0 2000 4000 raw: 301 ppm EPIC 201374602; Kp = 11.5 mag 10 20 30 40 50 60 70 80 time [BJD - 2456808] 400 0 400 residuals: 35 ppm relative brightness [ppm]

Slide 89

Slide 89 text

cbna Flickr user Aamir Choudhry Luger, et al. (2016, 2017) led by… Rodrigo Luger & Ethan Kruse

Slide 90

Slide 90 text

+ planet star spacecraft detector observation + + = PHYSICS GAUSSIAN PROCESS CAUSAL MODEL + PIXEL-LEVEL DECORRELATION PHOTON NOISE inspired by: Vanderburg & Johnson (2014) Crossfield, et al. (2015) Aigrain, et al. (2015) DFM, et al. (2015) Deming, et al. (2015) + more

Slide 91

Slide 91 text

Figure credit: Rodrigo Luger Ideal Observed

Slide 92

Slide 92 text

Pixel-level decorrelation (PLD) if background is correctly subtracted, and astrophysical signal is multiplicative, then the fractional astrophysical contribution is equal in all pixels. Deming, et al. (2015); Luger, et al. (2016, 2017) ˆ pn(t) = pn(t) PN k=1 pn(t) estimator for instrumental signal estimator for astrophysical signal pixel time series

Slide 93

Slide 93 text

= Figure credit: Rodrigo Luger; Deming, et al. (2015); Luger, et al. (2016, 2017) Pixel-level decorrelation (PLD) ÷

Slide 94

Slide 94 text

+ planet star spacecraft detector observation + + = PHYSICS GAUSSIAN PROCESS CAUSAL MODEL + PIXEL-LEVEL DECORRELATION PHOTON NOISE

Slide 95

Slide 95 text

Luger, et al. (2016); see also Aigrain, et al. (2015)

Slide 96

Slide 96 text

EVEREST + planet star spacecraft detector observation + + = PHYSICS GAUSSIAN PROCESS CAUSAL MODEL + PIXEL-LEVEL DECORRELATION PHOTON NOISE

Slide 97

Slide 97 text

EVEREST + planet star spacecraft detector observation + + = PHYSICS GAUSSIAN PROCESS CAUSAL MODEL + PIXEL-LEVEL DECORRELATION PHOTON NOISE

Slide 98

Slide 98 text

g. 3.— Cross-validation procedure for first order PLD o 03150 (WASP-47 e), a campaign 3 planet host. Show ter v in the validation set (red) and the scatter in the (blue) as a function of , the prior amplitude for Luger, Kruse, DFM, et al. (2017)

Slide 99

Slide 99 text

Luger, Kruse, DFM, et al. (2017) Kp = 15; for campaigns 3, 4, and 8, EVEREST recovers the Kepler precision dow of (variable) giant stars, leading to a higher average CDPP, while campaign 7 change in the orientation of the spacecraft and excess jitter. Fig. 20.— The same as Figure 19, but comparing the CDPP of all K2 stars to that of Kepler . EVEREST 2.0 recovers the original Kepler photometric precision down to at least Kp = 14, and past contam the in which inated valida fects o overfit spacec get ap of the apertu a time overfit § 3.7, o this be In F ing bin overfit light c binary

Slide 100

Slide 100 text

EVEREST 2.0 7 This pro- ma of the v n , and that e of the seg- 3, where we sections for e minimum al line indi- se between and slight re conserva- ith nPLD to report our and a com- arisons with curves. We proxy 6 hr h we calcu- we smooth clip outliers deviation in Luger, Kruse, DFM, et al. (2017)

Slide 101

Slide 101 text

Data Source: The NASA Exoplanet Archive; Kepler DR25; 5/13/2017 1 10 100 1000 orbital period [days] 1 10 planet radius [R ]

Slide 102

Slide 102 text

1 10 100 1000 orbital period [days] 1 10 planet radius [R ] Data Source: The NASA Exoplanet Archive; Kepler DR25; 5/13/2017 Kruse, et al. (in prep)

Slide 103

Slide 103 text

1 10 100 1000 orbital period [days] 1 10 planet radius [R ] Data Source: The NASA Exoplanet Archive; Kepler DR25; 5/13/2017 Kruse, et al. (in prep) 800 candidates 500 new

Slide 104

Slide 104 text

Population inference? as a function of host star properties?

Slide 105

Slide 105 text

40 50 60 70 80 0.985 0.990 0.995 1.000 1.005 90 100 110 0.985 0.990 0.995 1.000 1.005 −0.05 0.00 0.05 0.90 0.92 0.94 0.96 0.98 1.00 −0.06 −0.04 −0.02 0.00 0.02 0.04 0.06 −0.06 −0.04 −0.02 0.00 0.02 0.04 0.06 . . . . a b c d K2 long cadence data Barycentric Julian Date − 2,457,700 [day] Relative brightness Relative brightness 1b 1c 1d 1e 1f 1g 1h 1b 1c 1d 1e 1f 1g 1h Time from mid−transit [day] Relative brightness transit 1 transit 2 transit 3 transit 4 folded lightcurve Orbital separation [AU] Figure 1: a, b : Long cadence K2 light curve detrended with EVEREST and with stellar variability removed. Data points are in black, and our highest likelihood transit model for all seven planets TRAPPIST-1h: Luger, Sestovic, Kruse, et al. (2017); arXiv:1703.04166 embargoed

Slide 106

Slide 106 text

These Noise Models are Models of Stars 3

Slide 107

Slide 107 text

nuisance! interesting interesting

Slide 108

Slide 108 text

Suzanne Aigrain (Oxford) Vinesh Rajpaul (Oxford) Eric Agol (UW) Sivaram Ambikasaran (Indian Inst. of Sci.) in collaboration with… Angus, et al. (submitted) DFM, et al. (submitted) Ruth Angus (Columbia) led by…

Slide 109

Slide 109 text

Figure credit: Ruth Angus 100 101 Age (Gyr) 101 102 Rotation period (days) Coma Berenices Praesepe Hyades NGC 6811 NGC 6819 The Sun Asteroseismic targets M67 (Esselstein, in prep)

Slide 110

Slide 110 text

Angus, et al. (submitted); github.com/RuthAngus/GProtation

Slide 111

Slide 111 text

ctive model should ers and be flexible QP behaviour. A irements. We thus a method to prob- ation periods. This e rotation period, rtainty. arning community iology, geophysics used in the stellar e stellar variability l. 2012; Haywood 5; Haywood 2015; t al. 2015; Rajpaul eful in regression cifically when the variate Gaussian. If n in N dimensions, can describe that ocesses is provided tween data points demonstration, we ight curve of KIC s once every ⇠ 30.5 FGK stars. Clearly, summit of the Mauna Loa volcano in Hawaii (data from Keeling and Whorf 2004) using a kernel which is the product of a periodic and a SE kernel: the QP kernel. This kernel is defined as ki , j = A exp 2 6 6 6 6 4 ( xi xj )2 2 l 2 2 sin2 ⇡( xi xj ) P !3 7 7 7 7 5 + 2 ij . (2) It is the product of the SE kernel function, which describes the overall covariance decay, and an exponentiated, squared, sinusoidal kernel function that describes the periodic covariance structure. P can be interpreted as the rotation period of the star, and controls the amplitude of the sin2 term. If is very large, only points almost exactly one period away are tightly correlated and points that are slightly more or less than one period away are very loosely cor- related. If is small, points separated by one period are tightly correlated, and points separated by slightly more or less are still highly correlated, although less so. In other words, large values of lead to periodic variations with increasingly complex harmonic con- tent. This kernel function allows two data points that are separated in time by one rotation period to be tightly correlated, while also allowing points separated by half a period to be weakly correlated. The additional parameter captures white noise by adding a term to the diagonal of the covariance matrix. This can be interpreted to represent underestimation of observational uncertainties — if the uncertainties reported on the data are too small, it will be non- zero — or it can capture any remaining “jitter,” or residuals not captured by the e ective GP model. We use this QP kernel function (Equation 2) to produce the GP model that fits the Kepler light curve 0 20 40 time [days] 1.0 0.5 0.0 0.5 1.0 relative flux [ppt] Kepler light curve 10 1 100 ! [days 1] 10 3 10 2 10 1 S(!) power spectrum 0 0.000 0.025 0.050 0.075 0.100 0.125 k(⌧) 3.50 3.75 4.00 4.25 rotation period [days] Angus, et al. (submitted); github.com/RuthAngus/GProtation

Slide 112

Slide 112 text

0 1 2 3 4 ln(Injected Period) 2 0 2 4 6 ln(Recovered Period) 7 6 5 4 3 ln (Amplitude) Angus, et al. (submitted); github.com/RuthAngus/GProtation

Slide 113

Slide 113 text

github.com/ dfm/peerless rodluger/everest RuthAngus/GProtation dfm/celerite Jupiter analogs K2 de-trending GP models of rotation fast 1D GPs Open science

Slide 114

Slide 114 text

Summary 1 Find exoplanets 2 Learn about stars Build data-driven noise models and… Dan Foreman-Mackey Sagan Fellow / University of Washington @exoplaneteer / dfm.io / github.com/dfm