Next Generation Data Analysis in Astronomy

Next Generation Data Analysis in Astronomy

00c684a144d49f612a51e855eb326d6c?s=128

Dan Foreman-Mackey

June 12, 2012
Tweet

Transcript

  1. DATA ANALYSIS next generation in Astronomy Dan Foreman-Mackey CCPP @

    NYU
  2. Dan Foreman-Mackey Hi. I’m @__dfm__ github.com/dfm danfm.ca

  3. study Physics in NYC. Hi. and I at NYU.

  4. study Astronomy in NYC. Hi. and I at NYU.

  5. mostly just write code. Hi. and I I'm actually just

    an engineer.
  6. PROJECTS emcee Awesomer MCMC sampling in Python. (danfm.ca/emcee) The Thresher

    We don’t throw away data.™ (davidwhogg.github.com/TheThresher)
  7. PROJECTS emcee Awesomer MCMC sampling in Python. (danfm.ca/emcee) The Thresher

    We don’t throw away data.™ (davidwhogg.github.com/TheThresher) David W. Hogg Mustache courtesy: mustachify.me
  8. emcee danfm.ca/emcee p(⇥) I have a function

  9. emcee danfm.ca/emcee p(⇥) I have a function I can Evaluate

    it
  10. emcee danfm.ca/emcee p(⇥) I have a function I can Evaluate

    it I can't Calculate the functional form
  11. emcee danfm.ca/emcee p(⇥) I have a function I can Evaluate

    it I can't Calculate the functional form Markov chain Monte Carlo (MCMC)
  12. emcee danfm.ca/emcee Metropolis-Hastings

  13. emcee danfm.ca/emcee min ✓ 1 , p ( x 0)

    p ( x ) Q ( x ; x 0) Q ( x 0; x ) ◆ Metropolis-Hastings
  14. emcee danfm.ca/emcee min ✓ 1 , p ( x 0)

    p ( x ) Q ( x ; x 0) Q ( x 0; x ) ◆ Metropolis-Hastings Proposal D (D-1) parameters
  15. emcee danfm.ca/emcee Metropolis-Hastings x y

  16. emcee danfm.ca/emcee Metropolis-Hastings x y (in an ideal world)

  17. emcee danfm.ca/emcee Metropolis-Hastings x y (in an ideal world)

  18. emcee danfm.ca/emcee Metropolis-Hastings x y (in an ideal world)

  19. emcee danfm.ca/emcee Metropolis-Hastings x y (in an ideal world) min

    ✓ 1, p (x 0 ) p (x ) Q (x ;x 0 ) Q (x 0 ;x ) ◆ ?
  20. emcee danfm.ca/emcee Metropolis-Hastings x y (in an ideal world)

  21. emcee danfm.ca/emcee Metropolis-Hastings x y (in an ideal world)

  22. emcee danfm.ca/emcee Metropolis-Hastings x y (in an ideal world)

  23. emcee danfm.ca/emcee Metropolis-Hastings x y (in an ideal world) min

    ✓ 1 , p( x 0 ) p( x) Q( x; x 0 ) Q( x 0; x) ◆ ?
  24. emcee danfm.ca/emcee Metropolis-Hastings x y (in an ideal world)

  25. emcee danfm.ca/emcee Metropolis-Hastings x y (in an ideal world)

  26. emcee danfm.ca/emcee Metropolis-Hastings x y (in an ideal world)

  27. emcee danfm.ca/emcee Metropolis-Hastings x y (in an ideal world)

  28. emcee danfm.ca/emcee Metropolis-Hastings (in the REAL world) x y

  29. emcee danfm.ca/emcee Metropolis-Hastings (in the REAL world) x y

  30. emcee danfm.ca/emcee Metropolis-Hastings (in the REAL world) x y

  31. emcee danfm.ca/emcee Metropolis-Hastings (in the REAL world) x y

  32. emcee danfm.ca/emcee Metropolis-Hastings (in the REAL world) x y

  33. emcee danfm.ca/emcee Metropolis-Hastings (in the REAL world) x y SMALL

    ACCEPTANCE FRACTION the problem
  34. emcee danfm.ca/emcee Metropolis-Hastings (in the REAL world) x y LARGE

    ACCEPTANCE FRACTION the problem
  35. x y emcee danfm.ca/emcee Metropolis-Hastings (in the REAL world)

  36. x y emcee danfm.ca/emcee Metropolis-Hastings (in the REAL world)

  37. x y emcee danfm.ca/emcee Metropolis-Hastings (in the REAL world)

  38. x y emcee danfm.ca/emcee Metropolis-Hastings (in the REAL world) positive-definite

    symmetric Proposal D (D-1) parameters
  39. x y emcee danfm.ca/emcee Metropolis-Hastings (in the REAL world) positive-definite

    symmetric Proposal D (D-1) parameters This is the Dimension of your parameter space!
  40. emcee danfm.ca/emcee Metropolis-Hastings (in the REAL world) Scientific Awesomeness how

    hard is MCMC Metropolis Hastings how things Should be (~number of parameters)
  41. emcee danfm.ca/emcee Why does all this matter?

  42. emcee danfm.ca/emcee How do you calculate the optimal proposal?

  43. emcee danfm.ca/emcee Temperature

  44. emcee danfm.ca/emcee Temperature

  45. emcee danfm.ca/emcee Temperature

  46. emcee danfm.ca/emcee Temperature

  47. emcee danfm.ca/emcee Temperature

  48. emcee danfm.ca/emcee Temperature

  49. emcee danfm.ca/emcee Temperature

  50. emcee danfm.ca/emcee Temperature

  51. emcee danfm.ca/emcee Temperature

  52. emcee danfm.ca/emcee Temperature time

  53. emcee danfm.ca/emcee Temperature that should be spent interpreting your results

    writing papers finding bugs in your code time
  54. emcee danfm.ca/emcee Luckily I have a solution!

  55. emcee danfm.ca/emcee Luckily I have a solution! HINT: it's up

    here...
  56. emcee danfm.ca/emcee bit.ly/mcmc-gw10 "Ensemble samplers with affine invariance" Jonathan Goodman

    Jonathan Weare Mustaches courtesy: mustachify.me
  57. emcee danfm.ca/emcee bit.ly/mcmc-gw10 "Ensemble samplers with affine invariance" Jonathan Goodman

    Jonathan Weare Mustaches courtesy: mustachify.me
  58. emcee danfm.ca/emcee bit.ly/mcmc-gw10 "Ensemble samplers with affine invariance" Jonathan Goodman

    Jonathan Weare Mustaches courtesy: mustachify.me
  59. emcee danfm.ca/emcee affine invariance

  60. emcee danfm.ca/emcee affine invariance y A x + b Affine

    Transformation
  61. emcee danfm.ca/emcee affine invariance The sampler performs Equally well on

    X and Y y A x + b Affine Transformation
  62. emcee danfm.ca/emcee Easy to sample Hard to sample

  63. emcee danfm.ca/emcee Easy to sample Hard to sample y A

    x + b Affine Transformation
  64. emcee danfm.ca/emcee Easy to sample Hard to sample y A

    x + b Affine Transformation easy!
  65. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance
  66. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance
  67. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance
  68. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance this is a walker
  69. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance this is a walker
  70. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance
  71. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance
  72. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance
  73. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance
  74. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance
  75. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance min ✓ 1,Z D 1 p (x 0 ) p (x ) ◆
  76. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance min ✓ 1,Z D 1 p (x 0 ) p (x ) ◆
  77. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance
  78. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance
  79. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance
  80. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance
  81. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance Aside: this looks nice and parallel, eh? * * not quite as trivial as you might hope—but possible!
  82. emcee danfm.ca/emcee +

  83. emcee danfm.ca/emcee it's hammer time! emceethe MCMC Hammer introducing arxiv.org/abs/1202.3665

  84. emcee danfm.ca/emcee pip install emcee get it:

  85. emcee danfm.ca/emcee import numpy as np import emcee def lnprob(x):

    return -0.5 * np.sum(x ** 2) ndim, nwalkers = 10, 100 p0 = [np.random.rand(ndim) for i in range(nwalkers)] sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob) sampler.run_mcmc(p0, 1000) use it:
  86. emcee danfm.ca/emcee DOES IT WORK? obviously it does.

  87. emcee danfm.ca/emcee 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0

    0.5 1.0 exp ✓ (x1 x2) 2 2 ✏ (x1 + x2) 2 2 ◆
  88. emcee danfm.ca/emcee github.com/dfm/acor Autocorrelation Function the (covariance)

  89. emcee danfm.ca/emcee 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0

    0.5 1.0 exp ✓ (x1 x2) 2 2 ✏ (x1 + x2) 2 2 ◆ Metropolis-Hastings Emcee Autocorrelation Function the
  90. emcee danfm.ca/emcee 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0

    0.5 1.0 exp ✓ (x1 x2) 2 2 ✏ (x1 + x2) 2 2 ◆ Metropolis-Hastings Emcee Autocorrelation Function the
  91. emcee danfm.ca/emcee Metropolis-Hastings Boom!

  92. emcee danfm.ca/emcee 4 2 0 2 4 6 0 5

    10 15 20 25 30 exp ✓ 100 (x2 x 2 1) 2 + (1 x1) 2 20 ◆
  93. 4 2 0 2 4 6 0 5 10 15

    20 25 30 exp ✓ 100 (x2 x 2 1) 2 + (1 x1) 2 20 ◆ emcee danfm.ca/emcee Metropolis-Hastings Emcee Autocorrelation Function the
  94. emcee isn't always The Right Choice™ emcee danfm.ca/emcee Mustache courtesy:

    mustachify.me Brendon Brewer Remember:
  95. emcee danfm.ca/emcee Mustache courtesy: mustachify.me continuous parameters in a vector

    space emcee needs highly multimodal problems and it is not good at
  96. emcee danfm.ca/emcee Mustache courtesy: mustachify.me continuous parameters in a vector

    space emcee needs highly multimodal problems and it is not good at what is?
  97. emcee danfm.ca/emcee Mustache courtesy: mustachify.me continuous parameters in a vector

    space emcee needs highly multimodal problems and it is not good at what is? maybe Dnest github.com/eggplantbren/DNest3
  98. emcee danfm.ca/emcee Mustache courtesy: mustachify.me continuous parameters in a vector

    space emcee needs highly multimodal problems and it is not good at what is? maybe Dnest github.com/eggplantbren/DNest3 for example
  99. emcee danfm.ca/emcee it's still been pretty useful... Lang & Hogg

    (2011) Bovy et al. (2011) Dorman et al. (2012) Foreman-Mackey & Widrow (in prep) Mustaches courtesy: mustachify.me ... ... ...
  100. emceethe MCMC Hammer arxiv.org/abs/1202.3665 danfm.ca/emcee github.com/dfm/emcee paper documentation issues/contributions Check

    it out:
  101. Now, for a complete change of pace... (sort of)

  102. The Thresher danfm.ca/thresher ∝ λ D ∝ λ r0 Credit:

    Hirsch et al. (2011)
  103. The Thresher ∝ λ D danfm.ca/thresher

  104. The Thresher ∝ λ D ∝ λ D ∝ λ

    D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D danfm.ca/thresher
  105. The Thresher ∝ λ D ∝ λ D ∝ λ

    D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D Rank & Sort danfm.ca/thresher
  106. The Thresher ∝ λ D ∝ λ D ∝ λ

    D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D Rank & Sort Based on what ? danfm.ca/thresher
  107. The Thresher ∝ λ D ∝ λ D ∝ λ

    D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D Rank & Sort Based on what ? brightest pixel? danfm.ca/thresher
  108. The Thresher ∝ λ D ∝ λ D ∝ λ

    D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D Rank & Sort Based on what ? brightest pixel? * this actually does work surprisingly well... Seriously? * danfm.ca/thresher
  109. The Thresher ∝ λ D ∝ λ D ∝ λ

    D danfm.ca/thresher
  110. The Thresher ∝ λ D ∝ λ D ∝ λ

    D danfm.ca/thresher
  111. The Thresher ∝ λ D ∝ λ D ∝ λ

    D danfm.ca/thresher
  112. The Thresher ∝ λ D ∝ λ D ∝ λ

    D "Traditional Lucky Imaging" danfm.ca/thresher
  113. The Thresher But We don’t throw away data™ danfm.ca/thresher

  114. The Thresher Credit: Wikipedia a stack of Images the best

    possible Scene/PSF Note: horses not included. blind Deconvolution danfm.ca/thresher
  115. The Thresher Threshing is the process of loosening the edible

    part of cereal grain from the scaly, inedible chaff that surrounds it ... Threshing does not remove the bran from the grain. — Wikipedia part of a complete breakfast danfm.ca/thresher
  116. The Thresher Mustaches courtesy: mustachify.me Magain et al. (1998) Hirsch

    et al. (2011) References/Inspiration danfm.ca/thresher
  117. The Thresher WHAT IS AN IMAGE? danfm.ca/thresher

  118. The Thresher D = P ⇤ S + E data

    PSF scene noise danfm.ca/thresher
  119. The Thresher D = P ⇤ S + E data

    PSF scene noise d = · s + ✏ d = s0 · 0 + ✏ danfm.ca/thresher
  120. The Thresher D = P ⇤ S + E data

    PSF scene noise d = · s + ✏ d = s0 · 0 + ✏ Matrices danfm.ca/thresher
  121. The Thresher d = · s + ✏ d =

    s0 · 0 + ✏ Hipster image filters courtesy: instagr.am danfm.ca/thresher
  122. The Thresher d = · s + ✏ d =

    s0 · 0 + ✏ Hipster image filters courtesy: instagr.am Priors/Regularization danfm.ca/thresher
  123. The Thresher Those are some seriously HUGE matrices! danfm.ca/thresher

  124. The Thresher Luckily for us, linear least-squares is EASY *

    Especially when the system is sparse * danfm.ca/thresher
  125. The Thresher The Algorithm 2 Solve the least-squares problem to

    get The PSF 1 Use TLIto roughly align the data get an initialization for the scene & for one randomly selected image 3 Solve the sparse least-squares problem to get The Scene for the same image iterate with a new image danfm.ca/thresher
  126. The Thresher The Algorithm 2 Solve the least-squares problem to

    get The PSF 1 Use TLIto roughly align the data get an initialization for the scene & for one randomly selected image 3 Solve the sparse least-squares problem to get The Scene for the same image iterate with a new image Online! i.e. You can work with REALLY big DAtasets danfm.ca/thresher
  127. The Thresher HELLO my name is Stochastic Gradient and I

    have Convergence Guarantees! danfm.ca/thresher
  128. The Thresher NGC 3603 Credit: Wikipedia HST (just for illustration)

    danfm.ca/thresher
  129. The Thresher Data courtesy: Wolfgang Brandner (MPIA/AstraLux) 0 100 200

    300 400 500 0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500 danfm.ca/thresher
  130. The Thresher danfm.ca/thresher

  131. The Thresher The Thresher TLI danfm.ca/thresher

  132. Thresher THE COMING SOON davidwhogg.github.com/ github.com/davidwhogg/ documentation issues/contributions TheThresher

  133. Summary: I write code. Check it out. danfm.ca

  134. Summary: I write code. Check it out. danfm.ca Thanks! Grad

    Club?