Save 37% off PRO during our Black Friday Sale! »

Chasing exoplanets with Python and machine learning

Chasing exoplanets with Python and machine learning

PySciDataGre group launch event @ UGA. I presented, to a broad and multidisciplinary audience, two Python packages for exoplanet direct imaging: VIP and SODINN. VIP stands for Vortex Image Processing and is an instrument agnostic open-source library for image processing and detection of extrasolar planets and disks. SODINN stands for Supervised exOplanet detection via Direct Imaging with deep Neural Networks. It is a novel method for direct detection of exoplanets in a supervised learning framework, which features an improved sensitivity compared to state-of-the-art differential imaging approaches. In this presentation, I emphasized how these developments were enabled by the Python open-source scientific stack, a data science interdisciplinary approach and open-source scientific software development.

Transcript

  1. Chasing exoplanets with Python and machine learning Carlos Alberto Gomez

    Gonzalez PySciDataGre launch event, 08/03/2018
  2. Exoplanets

  3. 3

  4. Mostly, we rely on indirect methods for detecting exoplanets Because

    it’s very hard to see them this is not how they look like!
  5. Credit: NASA. https://exoplanets.nasa.gov/exep/coronagraphvideo/ VIDEOCLIP

  6. Basic calibration and “cosmetics” • Dark/bias subtraction • Flat fielding

    • Sky (thermal background) subtraction • Bad pixel correction Raw astronomical images Final residual image Image recentering • Center of mass • 2d Gaussian fit • DFT cross-correlation Bad frames removal • Image correlation • Pixel statistics (specific image regions) Reference PSF creation • Pairwise • Median • PCA, NMF • LOCI • LLSG Image combination • Mean, median, trimmed mean PSF reference subtraction De-rotation (for ADI) or rescaling (for mSDI) Characterization of detected companions Planet hunter pipeline
  7. Basic calibration and “cosmetics” • Dark/bias subtraction • Flat fielding

    • Sky (thermal background) subtraction • Bad pixel correction Raw astronomical images Final residual image Image recentering • Center of mass • 2d Gaussian fit • DFT cross-correlation Bad frames removal • Image correlation • Pixel statistics (specific image regions) Reference PSF creation • Pairwise • Median • PCA, NMF • LOCI • LLSG Image combination • Mean, median, trimmed mean PSF reference subtraction De-rotation (for ADI) or rescaling (for mSDI) Characterization of detected companions Planet hunter pipeline pre- processing post- processing
  8. pre- proc. VIDEOCLIP

  9. HR8799 bcde (Marois et al. 2008-2010) On of the lucky

    cases! Final images after post-processing (several epochs) post- proc.
  10. Why Python? • Well suited for science and exploratory analysis

    • High-level syntax and gentle learning curve
  11. Why Python? • Powerful open-source scientific software stack J. Vanderplas,

    PyCon 2017 keynote
  12. Why Python? • Becoming popular in Astronomy • Mentions of

    Software in Astronomy Publications: J. Vanderplas, PyCon 2017 keynote
  13. Why Python? • It’s just fun! https://xkcd.com/353/

  14. • https://github.com/vortex-exoplanet/VIP • Available on Pypi • Documentation (Sphinx): http://vip.readthedocs.io/

    • Bug tracking & interaction with users/devs Gomez Gonzalez et al. 2017 Vortex Image Processing library
  15. Gomez Gonzalez et al. 2017 Vortex Image Processing library

  16. Gomez Gonzalez et al. 2017 Vortex Image Processing library

  17. A lgo- ZO O Gomez Gonzalez et al. 2016

  18. None
  19. Open Science & reproducibility Open source

  20. “An article about computational result is advertising, not scholarship. The

    actual scholarship is the full software environment, code and data, that produced the result.” Buckheit and Donoho, 1995 “Today, software is to scientific research what Galileo’s telescope was to astronomy: a tool, combining science and engineering. It lies outside the central field of principal competence among the researchers that rely on it. … it builds upon scientific progress and shapes our scientific vision.” Pradal 2015
  21. With a great power… • Comes a great burden! •

    Developing and maintaining open-source code is not trivial • And a great responsibility… • Making sure the code is scientifically correct • And that it’s clean, free of bugs and well- documented Best practices for scientific computing (Wilson et al. 2012) Good enough practices in scientific computing (Wilson et al. 2016)
  22. Image sequence Final residual image ? ? ? ? ?

    ? ? Detection
  23. Detection VIDEOCLIP

  24. “Essentially, all models are wrong, but some are useful.” George

    Box “…if the model is going to be wrong anyway, why not see if you can get the computer to ‘quickly’ learn a model from the data, rather than have a human laboriously derive a model from a lot of thought.” Peter Norvig
  25. PC 1 PC 2 Unsupervised Supervised Regression Classification Dimensionality reduction

    Clustering Textbook Machine Learning
  26. Image (model PSF) subtraction Supervised detection (SODINN) noisy and unlabelled

    images data transformation + adequate (ML) model astonishing results
  27. • The goal is to learn a function that maps

    the input samples to the labels given a labeled dataset : 27 min f∈F 1 n n i=1 L(yi, f(xi )) + λΩ(f) f : X → Y, (xi, yi )i=1,...,n Supervised learning Goodfellow et al. 2016
  28. N x Pann k SVD low-rank approximation levels k residuals,

    back to image space X : MLAR samples 0 1 Convolutional LSTM layer kernel=(3x3), filters=40 Convolutional LSTM layer kernel=(2x2), filters=80 Dense layer units=128 Output dense layer units=1 3d Max pooling size=(2x2x2) 3d Max pooling size=(2x2x2) ReLU activation + dropout Sigmoid activation X and y to train/test/validation sets Probability of positive class MLAR patches Binary map probability threshold = 0.9 Trained classifier PSF Input cube, N frames Input cube y : Labels … … (a) (b) (c) SODINN: supervised detection of exoplanets Gomez Gonzalez et al. 2018
  29. Choosing K based on the explained variance ratio Multi-level Low-rank

    Approximation Residual (MLAR) samples M ∈ Rn×p M = UΣV T = n i=1 σiuivT i res = M − MBT k Bk (a) (b) (a) (b) Generating a labeled dataset C+ C- Labels: y ∈ {c−, c+}
  30. SODIRF: Random forest SODINN: convolutional LSTM deep neural network Goal

    - to make predictions on new samples: Training a classifier f : X → Y ˆ y = p(c+| MLAR sample) Training a model Convolutional LSTM layer kernel=(3x3), filters=40 Convolutional LSTM layer kernel=(2x2), filters=80 Dense layer units=128 Output dense layer units=1 3d Max pooling size=(2x2x2) 3d Max pooling size=(2x2x2) ReLU activation + dropout Sigmoid activation X and y to train/test/validation sets
  31. Good classifier True positive True Negative Threshold False Negative False

    Positive Observations Bad classifier Performance assessment
  32. Data-driven performance assessment

  33. None
  34. Computer science not easy to get here! Machine learning &

    Stats Domain knowledge Academic DS
  35. Transforming science: • Cross/inter-disciplinary research (Science with CS, ML, AI

    fields) • Ensuring the use of robust statistical approaches and well-suited metrics • Integrating cutting-edge AI developments Open (academic data) science http://jakevdp.github.io/blog/2014/08/22/hacking-academia/
  36. • Open peer-review • Code (and supporting data) release •

    Code publishing: • The Journal of Open Source Software • The Journal of Open Research Software • Knowledge sharing (non-refereed publications) • Data challenges (benchmark datasets) https://joss.theoj.org/ https://openresearchsoftware.metajnl.com/ Open (academic data) science
  37. And finally… Y U NO USE PYTHON!

  38. ¡Gracias! carlos.gomez@univ-grenoble-alpes.fr carlgogo carlosalbertogomezgonzalez https://carlgogo.github.io/