Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Astronomer's Introduction to Gaussian Proces...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

An Astronomer's Introduction to Gaussian Processes (v2)

Lecture for the Penn State summer school on "Bayesian Computing for Astronomical Data Analysis".

Avatar for Dan Foreman-Mackey

Dan Foreman-Mackey

June 13, 2014
Tweet

More Decks by Dan Foreman-Mackey

Other Decks in Science

Transcript

  1. 635 640 645 650 655 660 time [KBJD] 0.004 0.003

    0.002 0.001 0.000 0.001 0.002 0.003 relative flux KIC 3223000
  2. 460 480 500 520 time [KBJD] 0.015 0.010 0.005 0.000

    0.005 0.010 0.015 relative flux Kepler 32
  3. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4 y = m x + b
  4. 0 10 20 30 40 0 10 20 30 40

    The true covariance of the observations.
  5. Let’s assume that the noise is… log p(y | x,

    , ✓) = 1 2 N X n=1  [yn f✓(xn)] 2 2 n + log 2⇡ 2 n independent Gaussian with known variance
  6. log p (y | x , , ✓) = 1

    2 r T C 1 r 1 2 log det C N 2 log 2 ⇡ Or equivalently…
  7. log p (y | x , , ✓) = 1

    2 r T C 1 r 1 2 log det C N 2 log 2 ⇡ Or equivalently… data covariance
  8. log p (y | x , , ✓) = 1

    2 r T C 1 r 1 2 log det C N 2 log 2 ⇡ Or equivalently… data covariance residual vector r = ⇥ y1 f✓( x1) y2 f✓( x2) · · · yn f✓( xn) ⇤T
  9. log p (y | x , , ✓) = 1

    2 r T C 1 r 1 2 log det C N 2 log 2 ⇡ Or equivalently… data covariance residual vector r = ⇥ y1 f✓( x1) y2 f✓( x2) · · · yn f✓( xn) ⇤T 0 10 20 30 40 0 10 20 30 40
  10. Linear least-squares. A = 2 6 6 6 4 x1

    1 x2 1 . . . . . . xn 1 3 7 7 7 5 C = 2 6 6 6 4 2 1 0 · · · 0 0 2 2 · · · 0 . . . . . . ... . . . 0 0 · · · 2 n 3 7 7 7 5 y = 2 6 6 6 4 y1 y2 . . . yn 3 7 7 7 5  m b = S AT C 1 y S = ⇥ AT C 1 A ⇤ 1 maximum likelihood & in this case only mean of posterior posterior covariance
  11. Linear least-squares. A = 2 6 6 6 4 x1

    1 x2 1 . . . . . . xn 1 3 7 7 7 5 C = 2 6 6 6 4 2 1 0 · · · 0 0 2 2 · · · 0 . . . . . . ... . . . 0 0 · · · 2 n 3 7 7 7 5 y = 2 6 6 6 4 y1 y2 . . . yn 3 7 7 7 5  m b = S AT C 1 y S = ⇥ AT C 1 A ⇤ 1 maximum likelihood & in this case only mean of posterior posterior covariance assuming uniform priors
  12. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4 truth
  13. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4 truth posterior constraint?
  14. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4 truth posterior constraint?
  15. 0 10 20 30 40 0 10 20 30 40

    But we know the true covariance matrix.
  16. 0 10 20 30 40 0 10 20 30 40

    log p (y | x , , ✓) = 1 2 r T C 1 r 1 2 log det C N 2 log 2 ⇡
  17. Linear least-squares. A = 2 6 6 6 4 x1

    1 x2 1 . . . . . . xn 1 3 7 7 7 5 C = 2 6 6 6 4 2 1 0 · · · 0 0 2 2 · · · 0 . . . . . . ... . . . 0 0 · · · 2 n 3 7 7 7 5 y = 2 6 6 6 4 y1 y2 . . . yn 3 7 7 7 5  m b = S AT C 1 y S = ⇥ AT C 1 A ⇤ 1 maximum likelihood & in this case only mean of posterior posterior covariance 0 10 20 30 40 0 10 20 30 40
  18. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4 Before.
  19. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4 After.
  20. Model it! log p (y | x , , ✓

    , ↵) = 1 2 r T K 1 ↵ r 1 2 log det K↵ + C k↵(xi, xj) = a 2 exp ✓ [xi xj] 2 2 l 2 ◆ for example Kij = 2 i ij + k↵( xi, xj) where drop-in replacement for your current log-likelihood function!
  21. 2 1 0 1 2 b 2.5 0.0 2.5 ln

    a 0.0 0.5 1.0 1.5 m 2.5 0.0 2.5 ln s 2 1 0 1 2 b 2.5 0.0 2.5 ln a 2.5 0.0 2.5 ln s
  22. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4
  23. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4
  24. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4
  25. 6 4 2 0 2 4 6 4 3 2

    1 0 1 2 3 4
  26. log p (y | x , , ✓ , ↵)

    = 1 2 [y f✓(x)] T K↵(x , ) 1 [y f✓(x)] 1 2 log det K↵(x , ) N 2 log 2 ⇡ where The model. drop-in replacement for your current log-likelihood function! [ K↵( x, )]ij = i 2 ij + k↵( xi, xj)
  27. where The model. drop-in replacement for your current log-likelihood function!

    [ K↵( x, )]ij = i 2 ij + k↵( xi, xj) y ⇠ N ( f✓ ( x ), K↵( x , ))
  28. HUGE the data are drawn from one Gaussian * the

    dimension is the number of data points. *
  29. “Likelihood” samples. 2 1 0 1 2 3 exponential squared

    l = 0.5 l = 1 l = 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 2 1 0 1 0 2 4 6 8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 k↵(xi, xj) = exp ✓ [xi xj] 2 2 ` 2 ◆
  30. “Likelihood” samples. 2 1 0 1 2 3 exponential squared

    l = 0.5 l = 1 l = 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 2 1 0 1 0 2 4 6 8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 exponential squared k↵(xi, xj) = exp ✓ [xi xj] 2 2 ` 2 ◆
  31. “Likelihood” samples. 2 1 0 1 0 2 4 6

    8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 k↵(xi, xj) = " 1 + p 3 | xi xj | ` # exp ✓ | xi xj | ` ◆ cos ✓ 2 ⇡ | xi xj | P ◆
  32. “Likelihood” samples. 2 1 0 1 0 2 4 6

    8 10 t 2 1 0 1 2 3 quasi-periodic l = 2, P = 3 l = 3, P = 3 l = 3, P = 1 quasi-periodic k↵(xi, xj) = " 1 + p 3 | xi xj | ` # exp ✓ | xi xj | ` ◆ cos ✓ 2 ⇡ | xi xj | P ◆
  33. The conditional distribution y ⇠ N ( f✓ ( x

    ), K↵( x , ))  y y? ⇠ N ✓ f✓ ( x ) f✓ ( x?) ,  K ↵ , x , x K ↵ , x , ? K ↵ , ?, x K ↵ , ?, ? ◆ y? | y ⇠ N K ↵ , ?, x K 1 ↵ , x , x [ y f✓ ( x )] + f✓ ( x?), K ↵ , ?, ? K ↵ , ?, x K 1 ↵ , x , x K ↵ , x , ? ) just see Rasmussen & Williams (Chapter 2)
  34. What’s the catch? Kepler = Big Data Note: I hate

    myself for this slide too… (by some definition)
  35. Computational complexity. O(N3) naïvely: compute factorization // evaluate log-det //

    apply inverse log p (y | x , , ✓ , ↵) = 1 2 [y f✓(x)] T K↵(x , ) 1 [y f✓(x)] 1 2 log det K↵(x , ) N 2 log 2 ⇡
  36. import numpy as np from scipy.linalg import cho_factor, cho_solve !

    def simple_gp_lnlike(x, y, yerr, a, s): r = x[:, None] - x[None, :] C = np.diag(yerr**2) + a*np.exp(-0.5*r**2/(s*s)) factor, flag = cho_factor(C) logdet = 2*np.sum(np.log(np.diag(factor))) return -0.5 * (np.dot(y, cho_solve((factor, flag), y)) + logdet + len(x)*np.log(2*np.pi))
  37. 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 log10

    N 3 2 1 0 1 2 log10 runtime/seconds
  38. K(3) = K3 ⇥ K2 ⇥ K1 ⇥ K0 Full

    rank; Low-rank; Identity matrix; Zero matrix; Ambikasaran, DFM, et al. (arXiv:1403.6015)
  39. import numpy as np from george import GaussianProcess, kernels !

    def george_lnlike(x, y, yerr, a, s): kernel = a * kernels.RBFKernel(s) gp = GaussianProcess(kernel) gp.compute(x, yerr) return gp.lnlikelihood(y)
  40. 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 log10

    N 3 2 1 0 1 2 log10 runtime/seconds
  41. 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 log10

    N 3 2 1 0 1 2 log10 runtime/seconds
  42. 0.0 0.2 0.4 0.6 0.8 1.0 q1 0.0 0.2 0.4

    0.6 0.8 1.0 q2 9.984 9.992 10.000 10.008 t0 0.475 0.500 0.525 0.550 ⌧ 0.012 0.014 0.016 0.018 r/R? 0.00035 0.00060 0.00085 f? +9.994⇥10 1 0.0 0.2 0.4 0.6 0.8 b 0.0 0.2 0.4 0.6 0.8 1.0 q1 0.0 0.2 0.4 0.6 0.8 1.0 q2 9.984 9.992 10.000 10.008 t0 0.475 0.500 0.525 0.550 ⌧ 0.012 0.014 0.016 0.018 r/R? 0.0 0.2 0.4 0.6 0.8 b