Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Science in astro image processing: looking for exoplanets using machine learning

Data Science in astro image processing: looking for exoplanets using machine learning

Talk presented at the “Data science in the Alps” workshop. The program of this meeting was composed of a mixture of talks about methodological research and about various scientific applications of data science.

https://data-institute.univ-grenoble-alpes.fr/news-and-events/workshop-data-science-in-the-alps-732959.htm?RH=10277933017461015

Transcript

  1. Data science in astro image processing: looking for exoplanets using

    machine learning Carlos Alberto Gomez Gonzalez Data Science in the Alps, 20/03/2018
  2. Communication and team skills Domain knowledge Computer science Machine learning

    & Stats (Academic) Data Science
  3. Communication and team skills Domain knowledge Computer science Machine learning

    & Stats (Academic) Data Science
  4. Exoplanets

  5. 5

  6. Mostly, we rely on indirect methods for detecting exoplanets Because

    it’s very hard to see them this is not how they look like!
  7. Credit: NASA, http://planetquest.jpl.nasa.gov Animation

  8. SPHERE, Vigan et al. 2015 Very Large Telescope (VLT), Chile

  9. Credit: NASA, https://exoplanets.nasa.gov/exep/coronagraphvideo/ Videoclip

  10. fair amount of image processing !

  11. Basic calibration and “cosmetics” • Dark/bias subtraction • Flat fielding

    • Sky or thermal background subtraction • Bad pixel correction Raw astronomical images Detection on final residual image Image recentering Bad frames removal PSF modeling • Median • Pairwise, ANDROMEDA • LOCI • PCA, NMF • LLSG Image combination Model PSF subtraction De-rotation (ADI) or rescaling (mSDI) Characterization of detected companions Sequence of calibrated images
  12. calib. im ages 100x fainter synthetic planet bright synthetic planet

    starts here starts here Animation Animation
  13. HR8799 bcde (Marois et al. 2008-2010) On of the lucky

    cases! Final images after post-processing (several epochs) post- proc. Animation
  14. Communication and team skills Domain knowledge Computer science Machine learning

    & Stats (Academic) Data Science
  15. Communication and team skills Domain knowledge Computer science Machine learning

    & Stats (Academic) Data Science
  16. J. Vanderplas, PyCon 2017 keynote

  17. • Available on Pypi • Documentation: http://vip.readthedocs.io/ • https://github.com/vortex-exoplanet/VIP •

    Bug tracking & interaction with users/devs Gomez Gonzalez et al. 2017 Vortex Image Processing library
  18. • Continuous integration (Travis CI) • Python 2/3 compatibility •

    Automated testing (Pytest)
  19. A lgo- ZO O Gomez Gonzalez et al. 2016

  20. None
  21. Open Science & reproducibility Open source

  22. “An article about computational result is advertising, not scholarship. The

    actual scholarship is the full software environment, code and data, that produced the result.” Buckheit and Donoho, 1995 “Today, software is to scientific research what Galileo’s telescope was to astronomy: a tool, combining science and engineering. It lies outside the central field of principal competence among the researchers that rely on it. … it builds upon scientific progress and shapes our scientific vision.” Pradal 2015
  23. With a great power… • Comes a great burden! •

    Developing and maintaining open-source code is not trivial. • And a great responsibility… • Making sure the code is scientifically correct • and that it’s readable, free of bugs and well- documented Best practices for scientific computing (Wilson et al. 2012) Good enough practices in scientific computing (Wilson et al. 2016)
  24. Communication and team skills Domain knowledge Computer science Machine learning

    & Stats (Academic) Data Science
  25. Communication and team skills Domain knowledge Computer science Machine learning

    & Stats (Academic) Data Science
  26. Image sequence Final residual image ? ? ? ? ?

    ? ? Detection Animation
  27. Animation

  28. “Essentially, all models are wrong, but some are useful.” George

    Box “…if the model is going to be wrong anyway, why not see if you can get the computer to ‘quickly’ learn a model from the data, rather than have a human laboriously derive a model from a lot of thought.” Peter Norvig
  29. PC 1 PC 2 Unsupervised Supervised Regression Classification Dimensionality reduction

    Clustering Textbook Machine Learning
  30. Image (model PSF) subtraction Supervised detection (SODINN) noisy and unlabelled

    images data transformation + adequate (ML) model astonishing results
  31. • The goal is to learn a function that maps

    the input samples to the labels given a labeled dataset : min f∈F 1 n n i=1 L(yi, f(xi )) + λΩ(f) f : X → Y, (xi, yi )i=1,...,n Supervised learning Goodfellow et al. 2016
  32. Input X 1st Layer (data transformation) 2nd Layer (data transformation)

    Nth Layer (data transformation) … Predictions Y’ Input labels Y Loss function weights weights weights Optimizer loss score weight update Forward and backward passes f (x) = σ k (A k σ k−1 (A k−1 ...σ 2 (A 2 σ 1 (A 1 x))...)) Deep neural networks
  33. N x Pann k SVD low-rank approximation levels k residuals,

    back to image space X : MLAR samples 0 1 Convolutional LSTM layer kernel=(3x3), filters=40 Convolutional LSTM layer kernel=(2x2), filters=80 Dense layer units=128 Output dense layer units=1 3d Max pooling size=(2x2x2) 3d Max pooling size=(2x2x2) ReLU activation + dropout Sigmoid activation X and y to train/test/validation sets Probability of positive class MLAR patches Binary map probability threshold = 0.9 Trained classifier PSF Input cube, N frames Input cube y : Labels … … (a) (b) (c) Supervised detection of exoplanets Gomez Gonzalez et al. 2018
  34. Choosing K based on the explained variance ratio Multi-level Low-rank

    Approximation Residual (MLAR) samples M ∈ Rn×p M = UΣV T = n i=1 σiuivT i res = M − MBT k Bk (a) (b) (a) (b) Generating a labeled dataset C+ C- Labels: y ∈ {c−, c+}
  35. SODIRF: Random forest SODINN: convolutional LSTM deep neural network Goal

    - to make predictions on new samples: Training a classifier f : X → Y ˆ y = p(c+| MLAR sample) Training a model Convolutional LSTM layer kernel=(3x3), filters=40 Convolutional LSTM layer kernel=(2x2), filters=80 Dense layer units=128 Output dense layer units=1 3d Max pooling size=(2x2x2) 3d Max pooling size=(2x2x2) ReLU activation + dropout Sigmoid activation X and y to train/test/validation sets
  36. Probability of positive class MLAR patches Binary map probability threshold

    = 0.9 Trained classifier Input cube (c) Real data, HR8799 system Making Predictions
  37. Good classifier True positive True Negative Threshold False Negative False

    Positive Observations Bad classifier Performance assessment
  38. Data-driven performance assessment

  39. None
  40. Communication and team skills Domain knowledge Computer science Machine learning

    & Stats (Academic) Data Science not easy to get here!
  41. Communication and team skills Domain knowledge Computer science Machine learning

    & Stats (Academic) Data Science not easy to get here!
  42. • Cross/inter-disciplinary research (Science with CS, ML, AI fields) •

    To integrate cutting-edge AI developments • Ensuring the use of robust statistical approaches and well-suited metrics • Open peer-review http://jakevdp.github.io/blog/2014/08/22/hacking-academia/ Open (academic) data science
  43. • Code (and supporting data) release • Code publishing: •

    The Journal of Open Source Software • The Journal of Open Research Software • Knowledge sharing • Data challenges (benchmark datasets) • Chance to transform science!!! https://joss.theoj.org/ https://openresearchsoftware.metajnl.com/ Open (academic) data science
  44. • Interdisciplinary expertise isn’t yet properly recognized: • inadequate metrics

    and assessment mechanisms for promotion • No clear paths/protocols for establishing collaborations (multidisciplinarity) • It is often not trivial to navigate and integrate knowledge from different disciplines • Never-ending impostor syndrome https://www.nature.com/articles/s41599-017-0039-7 http://blog.fperez.org/2013/11/an-ambitious-experiment-in-data-science.html https://www.space.com/39420-becoming-astrophysicist-keeps-getting-tougher.html Interdisciplinarity: challenges
  45. ¡Gracias! carlos.gomez@univ-grenoble-alpes.fr carlgogo carlosalbertogomezgonzalez https://carlgogo.github.io/