Dan Foreman-Mackey
March 10, 2014
3.1k

# An Astronomer's Introduction to Gaussian Processes

A tutorial given at the Harvard Smithsonian Center for Astrophysics about using Gaussian processes to analyze Kepler light curves.

March 10, 2014

## Transcript

1. ### AN ASTRONOMER’S INTRODUCTION TO GAUSSIAN PROCESSES Dan Foreman-Mackey CCPP@NYU //

github.com/dfm // @exoplaneteer // dfm.io

12. ### astrophysical instrumental NOISE star spots stellar variability transiting exoplanets temperature

variations pointing shifts cosmic rays
13. ### 635 640 645 650 655 660 time [KBJD] 0.004 0.003

0.002 0.001 0.000 0.001 0.002 0.003 relative ﬂux Ruth’s favourite object: KIC 3223000
14. ### astrophysical instrumental NOISE star spots stellar variability transiting exoplanets temperature

variations pointing shifts cosmic rays

17. ### astrophysical instrumental NOISE star spots stellar variability transiting exoplanets temperature

variations pointing shifts cosmic rays

20. ### 6 4 2 0 2 4 6 4 3 2

1 0 1 2 3 4 y = m x + b
21. ### 0 10 20 30 40 0 10 20 30 40

The true covariance of the observations.
22. ### log p( y | m, b) = 1 2 X

n  (yn m xn b) 2 2 n + log 2 ⇡ 2 n independent Gaussian with known variance Let’s assume that the noise is…
23. ### Or equivalently… log p ( y | m, b )

= 1 2 rT C 1 r 1 2 log det C N 2 log 2 ⇡
24. ### Or equivalently… log p ( y | m, b )

= 1 2 rT C 1 r 1 2 log det C N 2 log 2 ⇡ diagonal
25. ### Or equivalently… log p ( y | m, b )

= 1 2 rT C 1 r 1 2 log det C N 2 log 2 ⇡ Ndata diagonal
26. ### Or equivalently… log p ( y | m, b )

= 1 2 rT C 1 r 1 2 log det C N 2 log 2 ⇡ Ndata residual vector r = ⇥ y1 ( m x1 + b ) · · · yn ( m xn + b ) ⇤T diagonal
27. ### Linear least-squares. A = 2 6 6 6 4 x1

1 x2 1 . . . . . . xn 1 3 7 7 7 5 C = 2 6 6 6 4 2 1 0 · · · 0 0 2 2 · · · 0 . . . . . . ... . . . 0 0 · · · 2 n 3 7 7 7 5 y = 2 6 6 6 4 y1 y2 . . . yn 3 7 7 7 5  m b = S AT C 1 y S = ⇥ AT C 1 A ⇤ 1 maximum likelihood & in this case only mean of posterior posterior covariance
28. ### Linear least-squares. A = 2 6 6 6 4 x1

1 x2 1 . . . . . . xn 1 3 7 7 7 5 C = 2 6 6 6 4 2 1 0 · · · 0 0 2 2 · · · 0 . . . . . . ... . . . 0 0 · · · 2 n 3 7 7 7 5 y = 2 6 6 6 4 y1 y2 . . . yn 3 7 7 7 5  m b = S AT C 1 y S = ⇥ AT C 1 A ⇤ 1 maximum likelihood & in this case only mean of posterior posterior covariance assuming uniform priors
29. ### 6 4 2 0 2 4 6 4 3 2

1 0 1 2 3 4 truth
30. ### 6 4 2 0 2 4 6 4 3 2

1 0 1 2 3 4 truth posterior constraint?
31. ### 6 4 2 0 2 4 6 4 3 2

1 0 1 2 3 4 truth posterior constraint?
32. ### 6 4 2 0 2 4 6 4 3 2

1 0 1 2 3 4 truth posterior constraint?
33. ### 0 10 20 30 40 0 10 20 30 40

But we know the true covariance matrix.
34. ### log p ( y | m, b ) = 1

2 rT C 1 r 1 2 log det C N 2 log 2 ⇡
35. ### log p ( y | m, b ) = 1

2 rT C 1 r 1 2 log det C N 2 log 2 ⇡ 0 10 20 30 40 0 10 20 30 40
36. ### Linear least-squares. A = 2 6 6 6 4 x1

1 x2 1 . . . . . . xn 1 3 7 7 7 5 C = 2 6 6 6 4 2 1 0 · · · 0 0 2 2 · · · 0 . . . . . . ... . . . 0 0 · · · 2 n 3 7 7 7 5 y = 2 6 6 6 4 y1 y2 . . . yn 3 7 7 7 5  m b = S AT C 1 y S = ⇥ AT C 1 A ⇤ 1 maximum likelihood & in this case only mean of posterior posterior covariance 0 10 20 30 40 0 10 20 30 40
37. ### 6 4 2 0 2 4 6 4 3 2

1 0 1 2 3 4 Before.
38. ### 6 4 2 0 2 4 6 4 3 2

1 0 1 2 3 4 After.

43. ### kernel or covariance function log p ( y | m,

b, a, s ) = 1 2 rT C 1 r 1 2 log det C N 2 log 2 ⇡ function of model parameters Cij = 2 i ij + a exp ✓ (xi xj) 2 2 s ◆ for example
44. ### log p ( m, b, a, s | y )

= log p ( y | m, b, a, s ) + log p ( m, b, a, s ) constant it's hammer time! emceethe MCMC Hammer arxiv.org/abs/1202.3665 dan.iel.fm/emcee +
45. ### 2 1 0 1 2 b 2.5 0.0 2.5 ln

a 0.0 0.5 1.0 1.5 m 2.5 0.0 2.5 ln s 2 1 0 1 2 b 2.5 0.0 2.5 ln a 2.5 0.0 2.5 ln s

1 0 1 2 3 4

1 0 1 2 3 4

49. ### log p ( y | m, b, ✓ ) =

1 2 rT K✓ 1 r 1 2 log det K✓ N 2 log 2 ⇡ K✓ij = 2 i ij + k✓( xi, xj) where The model. drop-in replacement for your current log-likelihood function!
50. ### HUGE the data are drawn from one Gaussian * the

square of the number of data points. *
51. ### “Prior” samples. 2 1 0 1 2 3 exponential squared

l = 0.5 l = 1 l = 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 2 1 0 1 0 2 4 6 8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 k✓( r ) = exp ✓ r2 2 l2 ◆
52. ### “Prior” samples. 2 1 0 1 2 3 exponential squared

l = 0.5 l = 1 l = 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 2 1 0 1 0 2 4 6 8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 k✓( r ) = exp ✓ r2 2 l2 ◆ exponential squared
53. ### k✓( r ) = " 1 + p 3 r

l # exp p 3 r l ! cos ✓ 2 ⇡ r P ◆ “Prior” samples. 2 1 0 1 0 2 4 6 8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1
54. ### k✓( r ) = " 1 + p 3 r

l # exp p 3 r l ! cos ✓ 2 ⇡ r P ◆ “Prior” samples. 2 1 0 1 0 2 4 6 8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 quasi-periodic
55. ### log p ( y | m, b, ✓ ) =

1 2 rT K✓ 1 r 1 2 log det K✓ N 2 log 2 ⇡ Computational complexity. O(N3) naïvely: compute LU decomposition // evaluate log-det // apply inverse
56. ### import numpy as np from scipy.linalg import cho_factor, cho_solve !

def simple_gp_lnlike (x, y, yerr, a, s): r = x[:, None] - x[None, :] C = np.diag(yerr**2) + a*np.exp(-0.5*r**2/(s*s)) factor, flag = cho_factor(C) logdet = np.sum(2*np.log(np.diag(factor))) return -0.5 * (np.dot(y, cho_solve((factor, flag), y)) + logdet + len(x)*np.log(2*np.pi))
57. ### 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 log10

N 3 2 1 0 1 2 log10 runtime/seconds

ever
63. ### K(3) = K3 ⇥ K2 ⇥ K1 ⇥ K0 Full

rank; Low-rank; Identity matrix; Zero matrix; Ambikasaran, DFM, et al. (in prep)

65. ### import numpy as np from george import GaussianProcess, kernels !

def george_lnlike(x, y, yerr, a, s): kernel = a * kernels.RBFKernel(s) gp = GaussianProcess(kernel) gp.compute(x, yerr) return gp.lnlikelihood(y)
66. ### 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 log10

N 3 2 1 0 1 2 log10 runtime/seconds
67. ### 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 log10

N 3 2 1 0 1 2 log10 runtime/seconds

seconds