Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Next Generation Data Analysis in Astronomy

Next Generation Data Analysis in Astronomy

Dan Foreman-Mackey

June 12, 2012
Tweet

More Decks by Dan Foreman-Mackey

Other Decks in Science

Transcript

  1. PROJECTS emcee Awesomer MCMC sampling in Python. (danfm.ca/emcee) The Thresher

    We don’t throw away data.™ (davidwhogg.github.com/TheThresher)
  2. PROJECTS emcee Awesomer MCMC sampling in Python. (danfm.ca/emcee) The Thresher

    We don’t throw away data.™ (davidwhogg.github.com/TheThresher) David W. Hogg Mustache courtesy: mustachify.me
  3. emcee danfm.ca/emcee p(⇥) I have a function I can Evaluate

    it I can't Calculate the functional form
  4. emcee danfm.ca/emcee p(⇥) I have a function I can Evaluate

    it I can't Calculate the functional form Markov chain Monte Carlo (MCMC)
  5. emcee danfm.ca/emcee min ✓ 1 , p ( x 0)

    p ( x ) Q ( x ; x 0) Q ( x 0; x ) ◆ Metropolis-Hastings
  6. emcee danfm.ca/emcee min ✓ 1 , p ( x 0)

    p ( x ) Q ( x ; x 0) Q ( x 0; x ) ◆ Metropolis-Hastings Proposal D (D-1) parameters
  7. emcee danfm.ca/emcee Metropolis-Hastings x y (in an ideal world) min

    ✓ 1, p (x 0 ) p (x ) Q (x ;x 0 ) Q (x 0 ;x ) ◆ ?
  8. emcee danfm.ca/emcee Metropolis-Hastings x y (in an ideal world) min

    ✓ 1 , p( x 0 ) p( x) Q( x; x 0 ) Q( x 0; x) ◆ ?
  9. x y emcee danfm.ca/emcee Metropolis-Hastings (in the REAL world) positive-definite

    symmetric Proposal D (D-1) parameters This is the Dimension of your parameter space!
  10. emcee danfm.ca/emcee Metropolis-Hastings (in the REAL world) Scientific Awesomeness how

    hard is MCMC Metropolis Hastings how things Should be (~number of parameters)
  11. emcee danfm.ca/emcee Easy to sample Hard to sample y A

    x + b Affine Transformation easy!
  12. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance this is a walker
  13. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance this is a walker
  14. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance min ✓ 1,Z D 1 p (x 0 ) p (x ) ◆
  15. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance min ✓ 1,Z D 1 p (x 0 ) p (x ) ◆
  16. emcee danfm.ca/emcee Ensemble Samplers (in the REAL world) x y

    with affine invariance Aside: this looks nice and parallel, eh? * * not quite as trivial as you might hope—but possible!
  17. emcee danfm.ca/emcee import numpy as np import emcee def lnprob(x):

    return -0.5 * np.sum(x ** 2) ndim, nwalkers = 10, 100 p0 = [np.random.rand(ndim) for i in range(nwalkers)] sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob) sampler.run_mcmc(p0, 1000) use it:
  18. emcee danfm.ca/emcee 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0

    0.5 1.0 exp ✓ (x1 x2) 2 2 ✏ (x1 + x2) 2 2 ◆
  19. emcee danfm.ca/emcee 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0

    0.5 1.0 exp ✓ (x1 x2) 2 2 ✏ (x1 + x2) 2 2 ◆ Metropolis-Hastings Emcee Autocorrelation Function the
  20. emcee danfm.ca/emcee 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0

    0.5 1.0 exp ✓ (x1 x2) 2 2 ✏ (x1 + x2) 2 2 ◆ Metropolis-Hastings Emcee Autocorrelation Function the
  21. emcee danfm.ca/emcee 4 2 0 2 4 6 0 5

    10 15 20 25 30 exp ✓ 100 (x2 x 2 1) 2 + (1 x1) 2 20 ◆
  22. 4 2 0 2 4 6 0 5 10 15

    20 25 30 exp ✓ 100 (x2 x 2 1) 2 + (1 x1) 2 20 ◆ emcee danfm.ca/emcee Metropolis-Hastings Emcee Autocorrelation Function the
  23. emcee danfm.ca/emcee Mustache courtesy: mustachify.me continuous parameters in a vector

    space emcee needs highly multimodal problems and it is not good at
  24. emcee danfm.ca/emcee Mustache courtesy: mustachify.me continuous parameters in a vector

    space emcee needs highly multimodal problems and it is not good at what is?
  25. emcee danfm.ca/emcee Mustache courtesy: mustachify.me continuous parameters in a vector

    space emcee needs highly multimodal problems and it is not good at what is? maybe Dnest github.com/eggplantbren/DNest3
  26. emcee danfm.ca/emcee Mustache courtesy: mustachify.me continuous parameters in a vector

    space emcee needs highly multimodal problems and it is not good at what is? maybe Dnest github.com/eggplantbren/DNest3 for example
  27. emcee danfm.ca/emcee it's still been pretty useful... Lang & Hogg

    (2011) Bovy et al. (2011) Dorman et al. (2012) Foreman-Mackey & Widrow (in prep) Mustaches courtesy: mustachify.me ... ... ...
  28. The Thresher ∝ λ D ∝ λ D ∝ λ

    D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D danfm.ca/thresher
  29. The Thresher ∝ λ D ∝ λ D ∝ λ

    D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D Rank & Sort danfm.ca/thresher
  30. The Thresher ∝ λ D ∝ λ D ∝ λ

    D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D Rank & Sort Based on what ? danfm.ca/thresher
  31. The Thresher ∝ λ D ∝ λ D ∝ λ

    D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D Rank & Sort Based on what ? brightest pixel? danfm.ca/thresher
  32. The Thresher ∝ λ D ∝ λ D ∝ λ

    D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D ∝ λ D Rank & Sort Based on what ? brightest pixel? * this actually does work surprisingly well... Seriously? * danfm.ca/thresher
  33. The Thresher ∝ λ D ∝ λ D ∝ λ

    D "Traditional Lucky Imaging" danfm.ca/thresher
  34. The Thresher Credit: Wikipedia a stack of Images the best

    possible Scene/PSF Note: horses not included. blind Deconvolution danfm.ca/thresher
  35. The Thresher Threshing is the process of loosening the edible

    part of cereal grain from the scaly, inedible chaff that surrounds it ... Threshing does not remove the bran from the grain. — Wikipedia part of a complete breakfast danfm.ca/thresher
  36. The Thresher Mustaches courtesy: mustachify.me Magain et al. (1998) Hirsch

    et al. (2011) References/Inspiration danfm.ca/thresher
  37. The Thresher D = P ⇤ S + E data

    PSF scene noise danfm.ca/thresher
  38. The Thresher D = P ⇤ S + E data

    PSF scene noise d = · s + ✏ d = s0 · 0 + ✏ danfm.ca/thresher
  39. The Thresher D = P ⇤ S + E data

    PSF scene noise d = · s + ✏ d = s0 · 0 + ✏ Matrices danfm.ca/thresher
  40. The Thresher d = · s + ✏ d =

    s0 · 0 + ✏ Hipster image filters courtesy: instagr.am danfm.ca/thresher
  41. The Thresher d = · s + ✏ d =

    s0 · 0 + ✏ Hipster image filters courtesy: instagr.am Priors/Regularization danfm.ca/thresher
  42. The Thresher Luckily for us, linear least-squares is EASY *

    Especially when the system is sparse * danfm.ca/thresher
  43. The Thresher The Algorithm 2 Solve the least-squares problem to

    get The PSF 1 Use TLIto roughly align the data get an initialization for the scene & for one randomly selected image 3 Solve the sparse least-squares problem to get The Scene for the same image iterate with a new image danfm.ca/thresher
  44. The Thresher The Algorithm 2 Solve the least-squares problem to

    get The PSF 1 Use TLIto roughly align the data get an initialization for the scene & for one randomly selected image 3 Solve the sparse least-squares problem to get The Scene for the same image iterate with a new image Online! i.e. You can work with REALLY big DAtasets danfm.ca/thresher
  45. The Thresher HELLO my name is Stochastic Gradient and I

    have Convergence Guarantees! danfm.ca/thresher
  46. The Thresher Data courtesy: Wolfgang Brandner (MPIA/AstraLux) 0 100 200

    300 400 500 0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500 danfm.ca/thresher