Slide 1

Slide 1 text

AN ASTRONOMER’S INTRODUCTION TO GAUSSIAN PROCESSES Dan Foreman-Mackey CCPP@NYU // github.com/dfm // @exoplaneteer // dfm.io

Slide 2

Slide 2 text

cbnd Flickr user lizphung

Slide 3

Slide 3 text

github.com/dfm/gp

Slide 4

Slide 4 text

gaussianprocess.org/gpml Rasmussen & Williams

Slide 5

Slide 5 text

I write code for good & astrophysics.

Slide 6

Slide 6 text

I (probably) do data science.

Slide 7

Slide 7 text

Photo credit James Silvester silvesterphoto.tumblr.com not data science.

Slide 8

Slide 8 text

cb Flickr user Marcin Wichary data science.

Slide 9

Slide 9 text

PHYSICS DATA Data Science p(data | physics)

Slide 10

Slide 10 text

I work with Kepler data.

Slide 11

Slide 11 text

NOISE I get really passionate about

Slide 12

Slide 12 text

635 640 645 650 655 660 time [KBJD] 0.004 0.003 0.002 0.001 0.000 0.001 0.002 0.003 relative flux KIC 3223000

Slide 13

Slide 13 text

Kepler 32

Slide 14

Slide 14 text

Kepler 32

Slide 15

Slide 15 text

460 480 500 520 time [KBJD] 0.015 0.010 0.005 0.000 0.005 0.010 0.015 relative flux Kepler 32

Slide 16

Slide 16 text

HACKS ALL HACKS and I’m righteous! *

Slide 17

Slide 17 text

Why not model all the things? with, for example, a Gaussian Process

Slide 18

Slide 18 text

The power of correlated noise. 1

Slide 19

Slide 19 text

6 4 2 0 2 4 6 4 3 2 1 0 1 2 3 4 y = m x + b

Slide 20

Slide 20 text

0 10 20 30 40 0 10 20 30 40 The true covariance of the observations.

Slide 21

Slide 21 text

Let’s assume that the noise is… log p(y | x, , ✓) = 1 2 N X n=1  [yn f✓(xn)] 2 2 n + log 2⇡ 2 n independent Gaussian with known variance

Slide 22

Slide 22 text

log p (y | x , , ✓) = 1 2 r T C 1 r 1 2 log det C N 2 log 2 ⇡ Or equivalently…

Slide 23

Slide 23 text

log p (y | x , , ✓) = 1 2 r T C 1 r 1 2 log det C N 2 log 2 ⇡ Or equivalently… data covariance

Slide 24

Slide 24 text

log p (y | x , , ✓) = 1 2 r T C 1 r 1 2 log det C N 2 log 2 ⇡ Or equivalently… data covariance residual vector r = ⇥ y1 f✓( x1) y2 f✓( x2) · · · yn f✓( xn) ⇤T

Slide 25

Slide 25 text

log p (y | x , , ✓) = 1 2 r T C 1 r 1 2 log det C N 2 log 2 ⇡ Or equivalently… data covariance residual vector r = ⇥ y1 f✓( x1) y2 f✓( x2) · · · yn f✓( xn) ⇤T 0 10 20 30 40 0 10 20 30 40

Slide 26

Slide 26 text

Linear least-squares. A = 2 6 6 6 4 x1 1 x2 1 . . . . . . xn 1 3 7 7 7 5 C = 2 6 6 6 4 2 1 0 · · · 0 0 2 2 · · · 0 . . . . . . ... . . . 0 0 · · · 2 n 3 7 7 7 5 y = 2 6 6 6 4 y1 y2 . . . yn 3 7 7 7 5  m b = S AT C 1 y S = ⇥ AT C 1 A ⇤ 1 maximum likelihood & in this case only mean of posterior posterior covariance

Slide 27

Slide 27 text

Linear least-squares. A = 2 6 6 6 4 x1 1 x2 1 . . . . . . xn 1 3 7 7 7 5 C = 2 6 6 6 4 2 1 0 · · · 0 0 2 2 · · · 0 . . . . . . ... . . . 0 0 · · · 2 n 3 7 7 7 5 y = 2 6 6 6 4 y1 y2 . . . yn 3 7 7 7 5  m b = S AT C 1 y S = ⇥ AT C 1 A ⇤ 1 maximum likelihood & in this case only mean of posterior posterior covariance assuming uniform priors

Slide 28

Slide 28 text

6 4 2 0 2 4 6 4 3 2 1 0 1 2 3 4 truth

Slide 29

Slide 29 text

6 4 2 0 2 4 6 4 3 2 1 0 1 2 3 4 truth posterior constraint?

Slide 30

Slide 30 text

6 4 2 0 2 4 6 4 3 2 1 0 1 2 3 4 truth posterior constraint?

Slide 31

Slide 31 text

0 10 20 30 40 0 10 20 30 40 But we know the true covariance matrix.

Slide 32

Slide 32 text

0 10 20 30 40 0 10 20 30 40 log p (y | x , , ✓) = 1 2 r T C 1 r 1 2 log det C N 2 log 2 ⇡

Slide 33

Slide 33 text

Linear least-squares. A = 2 6 6 6 4 x1 1 x2 1 . . . . . . xn 1 3 7 7 7 5 C = 2 6 6 6 4 2 1 0 · · · 0 0 2 2 · · · 0 . . . . . . ... . . . 0 0 · · · 2 n 3 7 7 7 5 y = 2 6 6 6 4 y1 y2 . . . yn 3 7 7 7 5  m b = S AT C 1 y S = ⇥ AT C 1 A ⇤ 1 maximum likelihood & in this case only mean of posterior posterior covariance 0 10 20 30 40 0 10 20 30 40

Slide 34

Slide 34 text

6 4 2 0 2 4 6 4 3 2 1 0 1 2 3 4 Before.

Slide 35

Slide 35 text

6 4 2 0 2 4 6 4 3 2 1 0 1 2 3 4 After.

Slide 36

Slide 36 text

cbd Flickr user MyFWCmedia the responsible scientist.

Slide 37

Slide 37 text

So… we’re finished, right?

Slide 38

Slide 38 text

In The Real World™, we never know the noise.

Slide 39

Slide 39 text

Just gotta model it!

Slide 40

Slide 40 text

0 10 20 30 40 0 10 20 30 40

Slide 41

Slide 41 text

Model it! log p (y | x , , ✓ , ↵) = 1 2 r T K 1 ↵ r 1 2 log det K↵ + C k↵(xi, xj) = a 2 exp ✓ [xi xj] 2 2 l 2 ◆ for example Kij = 2 i ij + k↵( xi, xj) where drop-in replacement for your current log-likelihood function!

Slide 42

Slide 42 text

it's hammer time! emceethe MCMC Hammer arxiv.org/abs/1202.3665 dan.iel.fm/emcee

Slide 43

Slide 43 text

2 1 0 1 2 b 2.5 0.0 2.5 ln a 0.0 0.5 1.0 1.5 m 2.5 0.0 2.5 ln s 2 1 0 1 2 b 2.5 0.0 2.5 ln a 2.5 0.0 2.5 ln s

Slide 44

Slide 44 text

6 4 2 0 2 4 6 4 3 2 1 0 1 2 3 4

Slide 45

Slide 45 text

6 4 2 0 2 4 6 4 3 2 1 0 1 2 3 4

Slide 46

Slide 46 text

Prediction?

Slide 47

Slide 47 text

6 4 2 0 2 4 6 4 3 2 1 0 1 2 3 4

Slide 48

Slide 48 text

6 4 2 0 2 4 6 4 3 2 1 0 1 2 3 4

Slide 49

Slide 49 text

take a deep breath. cba Flickr user kpjas

Slide 50

Slide 50 text

The formal Gaussian process. 2

Slide 51

Slide 51 text

log p (y | x , , ✓ , ↵) = 1 2 [y f✓(x)] T K↵(x , ) 1 [y f✓(x)] 1 2 log det K↵(x , ) N 2 log 2 ⇡ where The model. drop-in replacement for your current log-likelihood function! [ K↵( x, )]ij = i 2 ij + k↵( xi, xj)

Slide 52

Slide 52 text

where The model. drop-in replacement for your current log-likelihood function! [ K↵( x, )]ij = i 2 ij + k↵( xi, xj) y ⇠ N ( f✓ ( x ), K↵( x , ))

Slide 53

Slide 53 text

HUGE the data are drawn from one Gaussian * the dimension is the number of data points. *

Slide 54

Slide 54 text

A generative model a probability distribution for y values y ⇠ N ( f✓ ( x ), K↵( x , ))

Slide 55

Slide 55 text

“Likelihood” samples. 2 1 0 1 2 3 exponential squared l = 0.5 l = 1 l = 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 2 1 0 1 0 2 4 6 8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 k↵(xi, xj) = exp ✓ [xi xj] 2 2 ` 2 ◆

Slide 56

Slide 56 text

“Likelihood” samples. 2 1 0 1 2 3 exponential squared l = 0.5 l = 1 l = 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 2 1 0 1 0 2 4 6 8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 exponential squared k↵(xi, xj) = exp ✓ [xi xj] 2 2 ` 2 ◆

Slide 57

Slide 57 text

“Likelihood” samples. 2 1 0 1 0 2 4 6 8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 k↵(xi, xj) = " 1 + p 3 | xi xj | ` # exp ✓ | xi xj | ` ◆ cos ✓ 2 ⇡ | xi xj | P ◆

Slide 58

Slide 58 text

“Likelihood” samples. 2 1 0 1 0 2 4 6 8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 quasi-periodic k↵(xi, xj) = " 1 + p 3 | xi xj | ` # exp ✓ | xi xj | ` ◆ cos ✓ 2 ⇡ | xi xj | P ◆

Slide 59

Slide 59 text

The conditional distribution y ⇠ N ( f✓ ( x ), K↵( x , ))  y y? ⇠ N ✓ f✓ ( x ) f✓ ( x?) ,  K ↵ , x , x K ↵ , x , ? K ↵ , ?, x K ↵ , ?, ? ◆ y? | y ⇠ N K ↵ , ?, x K 1 ↵ , x , x [ y f✓ ( x )] + f✓ ( x?), K ↵ , ?, ? K ↵ , ?, x K 1 ↵ , x , x K ↵ , x , ? ) just see Rasmussen & Williams (Chapter 2)

Slide 60

Slide 60 text

What’s the catch? Kepler = Big Data Note: I hate myself for this slide too… (by some definition)

Slide 61

Slide 61 text

Computational complexity. O(N3) naïvely: compute factorization // evaluate log-det // apply inverse log p (y | x , , ✓ , ↵) = 1 2 [y f✓(x)] T K↵(x , ) 1 [y f✓(x)] 1 2 log det K↵(x , ) N 2 log 2 ⇡

Slide 62

Slide 62 text

import numpy as np from scipy.linalg import cho_factor, cho_solve ! def simple_gp_lnlike(x, y, yerr, a, s): r = x[:, None] - x[None, :] C = np.diag(yerr**2) + a*np.exp(-0.5*r**2/(s*s)) factor, flag = cho_factor(C) logdet = 2*np.sum(np.log(np.diag(factor))) return -0.5 * (np.dot(y, cho_solve((factor, flag), y)) + logdet + len(x)*np.log(2*np.pi))

Slide 63

Slide 63 text

2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 log10 N 3 2 1 0 1 2 log10 runtime/seconds

Slide 64

Slide 64 text

exponential squared quasi-periodic

Slide 65

Slide 65 text

exponential squared quasi-periodic

Slide 66

Slide 66 text

exponential squared quasi-periodic t ⌃

Slide 67

Slide 67 text

“ Aren’t kernel matrices Hierarchical Off-Diagonal Low-Rank? — no astronomer ever

Slide 68

Slide 68 text

K(3) = K3 ⇥ K2 ⇥ K1 ⇥ K0 Full rank; Low-rank; Identity matrix; Zero matrix; Ambikasaran, DFM, et al. (arXiv:1403.6015)

Slide 69

Slide 69 text

github.com/dfm/george

Slide 70

Slide 70 text

import numpy as np from george import GaussianProcess, kernels ! def george_lnlike(x, y, yerr, a, s): kernel = a * kernels.RBFKernel(s) gp = GaussianProcess(kernel) gp.compute(x, yerr) return gp.lnlikelihood(y)

Slide 71

Slide 71 text

2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 log10 N 3 2 1 0 1 2 log10 runtime/seconds

Slide 72

Slide 72 text

2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 log10 N 3 2 1 0 1 2 log10 runtime/seconds

Slide 73

Slide 73 text

and short cadence data? one month of data in 4 seconds

Slide 74

Slide 74 text

Applications to Kepler data. 3

Slide 75

Slide 75 text

Parameter Recovery

Slide 76

Slide 76 text

Object: KIC 2301306 + injected super-Earth KIC 2301306 + injection time since transit

Slide 77

Slide 77 text

Object: KIC 2301306 + injected super-Earth KIC 2301306 + injection time since transit

Slide 78

Slide 78 text

Object: KIC 2301306 + injected super-Earth KIC 2301306 + injection time since transit

Slide 79

Slide 79 text

figure generated using: github.com/dfm/triangle.py after median-filter detrending

Slide 80

Slide 80 text

Object: KIC 2301306 + injected super-Earth KIC 2301306 + injection time since transit

Slide 81

Slide 81 text

Object: KIC 2301306 + injected super-Earth KIC 2301306 + injection time since transit

Slide 82

Slide 82 text

figure generated using: github.com/dfm/triangle.py using Gaussian process noise model

Slide 83

Slide 83 text

time [days] Ambikasaran, DFM, et al. (arXiv:1403.6015)

Slide 84

Slide 84 text

0.0 0.2 0.4 0.6 0.8 1.0 q1 0.0 0.2 0.4 0.6 0.8 1.0 q2 9.984 9.992 10.000 10.008 t0 0.475 0.500 0.525 0.550 ⌧ 0.012 0.014 0.016 0.018 r/R? 0.00035 0.00060 0.00085 f? +9.994⇥10 1 0.0 0.2 0.4 0.6 0.8 b 0.0 0.2 0.4 0.6 0.8 1.0 q1 0.0 0.2 0.4 0.6 0.8 1.0 q2 9.984 9.992 10.000 10.008 t0 0.475 0.500 0.525 0.550 ⌧ 0.012 0.014 0.016 0.018 r/R? 0.0 0.2 0.4 0.6 0.8 b

Slide 85

Slide 85 text

KOI 1474.01 with Bekki Dawson, et al.

Slide 86

Slide 86 text

No content

Slide 87

Slide 87 text

No content

Slide 88

Slide 88 text

Stellar Rotation with Ruth Angus

Slide 89

Slide 89 text

figures from Ruth Angus (Oxford)

Slide 90

Slide 90 text

figures from Ruth Angus (Oxford)

Slide 91

Slide 91 text

Conclusions & Summary. 4

Slide 92

Slide 92 text

correlated noise matters. a Gaussian process provides a drop-in replacement likelihood function if you can compute it

Slide 93

Slide 93 text

Resources gaussianprocess.org/gpml github.com/dfm/ gp george [email protected]