An Astronomer's Introduction to Gaussian Processes

An Astronomer's Introduction to Gaussian Processes

A tutorial given at the Harvard Smithsonian Center for Astrophysics about using Gaussian processes to analyze Kepler light curves.

00c684a144d49f612a51e855eb326d6c?s=128

Dan Foreman-Mackey

March 10, 2014
Tweet

Transcript

  1. AN ASTRONOMER’S INTRODUCTION TO GAUSSIAN PROCESSES Dan Foreman-Mackey CCPP@NYU //

    github.com/dfm // @exoplaneteer // dfm.io
  2. github.com/dfm/gp-tutorial

  3. I write code for good & astrophysics.

  4. I (probably) do data science.

  5. Photo credit James Silvester silvesterphoto.tumblr.com not data science.

  6. cb Flickr user Marcin Wichary data science.

  7. MODEL DATA Data Science

  8. MODEL DATA Data Science p (data | model)

  9. I work with Kepler data.

  10. NOISE I get really passionate about

  11. astrophysical instrumental NOISE

  12. astrophysical instrumental NOISE star spots stellar variability transiting exoplanets temperature

    variations pointing shifts cosmic rays
  13. 635 640 645 650 655 660 time [KBJD] 0.004 0.003

    0.002 0.001 0.000 0.001 0.002 0.003 relative flux Ruth’s favourite object: KIC 3223000
  14. astrophysical instrumental NOISE star spots stellar variability transiting exoplanets temperature

    variations pointing shifts cosmic rays
  15. Kepler 32

  16. Kepler 32

  17. astrophysical instrumental NOISE star spots stellar variability transiting exoplanets temperature

    variations pointing shifts cosmic rays
  18. Why not model all the things?

  19. The power of correlated noise. 1

  20. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4 y = m x + b
  21. 0 10 20 30 40 0 10 20 30 40

    The true covariance of the observations.
  22. log p( y | m, b) = 1 2 X

    n  (yn m xn b) 2 2 n + log 2 ⇡ 2 n independent Gaussian with known variance Let’s assume that the noise is…
  23. Or equivalently… log p ( y | m, b )

    = 1 2 rT C 1 r 1 2 log det C N 2 log 2 ⇡
  24. Or equivalently… log p ( y | m, b )

    = 1 2 rT C 1 r 1 2 log det C N 2 log 2 ⇡ diagonal
  25. Or equivalently… log p ( y | m, b )

    = 1 2 rT C 1 r 1 2 log det C N 2 log 2 ⇡ Ndata diagonal
  26. Or equivalently… log p ( y | m, b )

    = 1 2 rT C 1 r 1 2 log det C N 2 log 2 ⇡ Ndata residual vector r = ⇥ y1 ( m x1 + b ) · · · yn ( m xn + b ) ⇤T diagonal
  27. Linear least-squares. A = 2 6 6 6 4 x1

    1 x2 1 . . . . . . xn 1 3 7 7 7 5 C = 2 6 6 6 4 2 1 0 · · · 0 0 2 2 · · · 0 . . . . . . ... . . . 0 0 · · · 2 n 3 7 7 7 5 y = 2 6 6 6 4 y1 y2 . . . yn 3 7 7 7 5  m b = S AT C 1 y S = ⇥ AT C 1 A ⇤ 1 maximum likelihood & in this case only mean of posterior posterior covariance
  28. Linear least-squares. A = 2 6 6 6 4 x1

    1 x2 1 . . . . . . xn 1 3 7 7 7 5 C = 2 6 6 6 4 2 1 0 · · · 0 0 2 2 · · · 0 . . . . . . ... . . . 0 0 · · · 2 n 3 7 7 7 5 y = 2 6 6 6 4 y1 y2 . . . yn 3 7 7 7 5  m b = S AT C 1 y S = ⇥ AT C 1 A ⇤ 1 maximum likelihood & in this case only mean of posterior posterior covariance assuming uniform priors
  29. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4 truth
  30. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4 truth posterior constraint?
  31. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4 truth posterior constraint?
  32. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4 truth posterior constraint?
  33. 0 10 20 30 40 0 10 20 30 40

    But we know the true covariance matrix.
  34. log p ( y | m, b ) = 1

    2 rT C 1 r 1 2 log det C N 2 log 2 ⇡
  35. log p ( y | m, b ) = 1

    2 rT C 1 r 1 2 log det C N 2 log 2 ⇡ 0 10 20 30 40 0 10 20 30 40
  36. Linear least-squares. A = 2 6 6 6 4 x1

    1 x2 1 . . . . . . xn 1 3 7 7 7 5 C = 2 6 6 6 4 2 1 0 · · · 0 0 2 2 · · · 0 . . . . . . ... . . . 0 0 · · · 2 n 3 7 7 7 5 y = 2 6 6 6 4 y1 y2 . . . yn 3 7 7 7 5  m b = S AT C 1 y S = ⇥ AT C 1 A ⇤ 1 maximum likelihood & in this case only mean of posterior posterior covariance 0 10 20 30 40 0 10 20 30 40
  37. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4 Before.
  38. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4 After.
  39. cbd Flickr user MyFWCmedia the responsible scientist.

  40. So… we’re finished, right?

  41. In The Real World™, we never know the noise.

  42. Just gotta model it!

  43. kernel or covariance function log p ( y | m,

    b, a, s ) = 1 2 rT C 1 r 1 2 log det C N 2 log 2 ⇡ function of model parameters Cij = 2 i ij + a exp ✓ (xi xj) 2 2 s ◆ for example
  44. log p ( m, b, a, s | y )

    = log p ( y | m, b, a, s ) + log p ( m, b, a, s ) constant it's hammer time! emceethe MCMC Hammer arxiv.org/abs/1202.3665 dan.iel.fm/emcee +
  45. 2 1 0 1 2 b 2.5 0.0 2.5 ln

    a 0.0 0.5 1.0 1.5 m 2.5 0.0 2.5 ln s 2 1 0 1 2 b 2.5 0.0 2.5 ln a 2.5 0.0 2.5 ln s
  46. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4
  47. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4
  48. The formal Gaussian process. 2

  49. log p ( y | m, b, ✓ ) =

    1 2 rT K✓ 1 r 1 2 log det K✓ N 2 log 2 ⇡ K✓ij = 2 i ij + k✓( xi, xj) where The model. drop-in replacement for your current log-likelihood function!
  50. HUGE the data are drawn from one Gaussian * the

    square of the number of data points. *
  51. “Prior” samples. 2 1 0 1 2 3 exponential squared

    l = 0.5 l = 1 l = 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 2 1 0 1 0 2 4 6 8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 k✓( r ) = exp ✓ r2 2 l2 ◆
  52. “Prior” samples. 2 1 0 1 2 3 exponential squared

    l = 0.5 l = 1 l = 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 2 1 0 1 0 2 4 6 8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 k✓( r ) = exp ✓ r2 2 l2 ◆ exponential squared
  53. k✓( r ) = " 1 + p 3 r

    l # exp p 3 r l ! cos ✓ 2 ⇡ r P ◆ “Prior” samples. 2 1 0 1 0 2 4 6 8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1
  54. k✓( r ) = " 1 + p 3 r

    l # exp p 3 r l ! cos ✓ 2 ⇡ r P ◆ “Prior” samples. 2 1 0 1 0 2 4 6 8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 quasi-periodic
  55. log p ( y | m, b, ✓ ) =

    1 2 rT K✓ 1 r 1 2 log det K✓ N 2 log 2 ⇡ Computational complexity. O(N3) naïvely: compute LU decomposition // evaluate log-det // apply inverse
  56. import numpy as np from scipy.linalg import cho_factor, cho_solve !

    def simple_gp_lnlike (x, y, yerr, a, s): r = x[:, None] - x[None, :] C = np.diag(yerr**2) + a*np.exp(-0.5*r**2/(s*s)) factor, flag = cho_factor(C) logdet = np.sum(2*np.log(np.diag(factor))) return -0.5 * (np.dot(y, cho_solve((factor, flag), y)) + logdet + len(x)*np.log(2*np.pi))
  57. 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 log10

    N 3 2 1 0 1 2 log10 runtime/seconds
  58. exponential squared quasi-periodic

  59. exponential squared quasi-periodic t ⌃

  60. exponential squared quasi-periodic t ⌃ O(N⇠2) sparse. still:

  61. FFT?

  62. “ Aren’t kernel matrices Hierarchical Off-Diagonal Low-Rank? — no astronomer

    ever
  63. K(3) = K3 ⇥ K2 ⇥ K1 ⇥ K0 Full

    rank; Low-rank; Identity matrix; Zero matrix; Ambikasaran, DFM, et al. (in prep)
  64. github.com/dfm/george

  65. import numpy as np from george import GaussianProcess, kernels !

    def george_lnlike(x, y, yerr, a, s): kernel = a * kernels.RBFKernel(s) gp = GaussianProcess(kernel) gp.compute(x, yerr) return gp.lnlikelihood(y)
  66. 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 log10

    N 3 2 1 0 1 2 log10 runtime/seconds
  67. 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 log10

    N 3 2 1 0 1 2 log10 runtime/seconds
  68. and short cadence data? one month of data in 4

    seconds
  69. Applications to Kepler data. 3

  70. Parameter Recovery with Sivaram Ambikasaran, et al.

  71. None
  72. None
  73. None
  74. None
  75. None
  76. None
  77. None
  78. None
  79. None
  80. None