Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Exoplanet direct imaging meets data science

Exoplanet direct imaging meets data science

Talk given at NASA ARC center and Stanford in June 2018. The slides present my Data Science approach to the problem of direct detection of exoplanets, leveraging recent developments in Machine/Deep Learning and the practices from the open-source Python community.

More Decks by Carlos Alberto Gomez Gonzalez

Other Decks in Research

Transcript

  1. I have a very particular set of skills About me

    http://carlgogo.github.io/ (Home) carlgogo (Github) carlgogo (Speakerdeck) carlosalbertogomezgonzalez (Linkedin)
  2. HR8799, Marois et al. 2010 20 AU 0.5” Konopacky et

    al. 2013 Bowler 2016 Power of direct imaging Milli et al. 2016
  3. Seeing limited image Improving angular resolution + reducing the contrast

    and dynamic range AO corrected image Coronagraphic image Post-processed image Coronagraphy Wavefront control Observing techniques Image post- processing Ground-based HCI *
  4. Raw astronomical images Sequence of calibrated images Basic calibration and

    “cosmetics” • Dark/bias subtraction • Flat fielding • Sky or thermal background subtraction • Bad pixel correction Image recentering Bad frames removal Image combination PSF modeling • Median • Pairwise, ANDROMEDA • LOCI • PCA, NMF • LLSG Model PSF subtraction Detection on residual frame or detection map Characterization of “detected” companions R1 , 1 , F1 R2 , 2 , F2 R3 , 3 , F3 R4 , 4 , F4
  5. synthetic planet Angular differential imaging (ADI) Sequence of calibrated images

    with a bright synthetic planet Signal and noise A N IM ATIO N
  6. • Pairwise subtraction (correlated reference) • Median of references •

    Least-squares combination of references • Low-rank approximation (SVD, PCA, NMF, …) by learning a basis from references Model PSF subtraction di — mi = resi ith frame discarded frames reference frames T
  7. • Available on Pypi • https://github.com/vortex-exoplanet/VIP • Documentation (Sphynx): http://vip.readthedocs.io/

    • Jupyter tutorial • Python 2/3 compatibility • Continuous integration and automated testing (Pytest) Gomez Gonzalez et al. 2017 Vortex Image Processing library
  8. Marois et al. 2007 Lafrenière et al. 2007 Mugnier et

    al. 2009, Cantalloube 2015 Soummer et al. 2012, Amara & Quanz 2012 VIP: ADI algorithmic zoo Absil et al. 2013 Marois et al. 2014 Gomez Gonzalez et al. 2016, 2017
  9. API

  10. API

  11. “An article about computational result is advertising, not scholarship. The

    actual scholarship is the full software environment, code and data, that produced the result.” Buckheit and Donoho, 1995
  12. Image sequence Final residual image ? ? ? ? ?

    ? ? Speckles (?) Real planet Synthetic planets Detection A N IM ATIO N
  13. Image sequence Final residual image Detection Detection map Mugnier et

    al. 2009, Cantalloube 2015 Ruffio et al. 2017 SNR metric Matched filter A N IM ATIO N
  14. “Essentially, all models are wrong, but some are useful.” George

    Box “…if the model is going to be wrong anyway, why not see if you can get the computer to ‘quickly’ learn a model from the data, rather than have a human laboriously derive a model from a lot of thought.” Peter Norvig
  15. Machine learning in a nutshell Unsupervised Supervised PC 1 PC

    2 Dimensionality reduction Clustering and reinforcement learning Density estimation Regression Classification
  16. { Network architecture Loss function and regularization { Optimization Supervised

    learning f = arg min fθ,θ∈Θ n i=1 L(yi, fθ (xi )) + g(θ) f : X → Y, Training set (labeled data) chihuahua muffin (xi, yi )i=1,...,n
  17. ▸ Activations Input X 1st Layer (data transformation) 2nd Layer

    (data transformation) Nth Layer (data transformation) …
  18. ▸ Maxpooling ▸ Dropout Input X 1st Layer (data transformation)

    2nd Layer (data transformation) Nth Layer (data transformation) …
  19. Input X Nth Layer (data transformation) … 1st Layer (data

    transformation) 2nd Layer (data transformation) Predictions Ŷ Labels Y Loss function weights weights weights weight update Optimizer loss score
  20. N HCI sequences have no label, but lots of noise

    instead Single ADI dataset, using N calibrated images. We can inject companions but: • high-contrast issue • N is too small … Supervised detection?
  21. HCI sequences have no label, but lots of noise instead

    Survey scenario with M image sequences (observed stars). We can inject companions and potentially reduce the dynamic range but: • M is still too small • Computationally expensive Supervised detection? M …
  22. Explained variance ratio: Multi-level Low-rank Approximation Residual samples ˆ σj

    2 i ˆ σi 2 M ∈ Rn×p M = UΣV T = n i=1 σiuivT i res = M − MBT k Bk Labeling data
  23. (a) (b) (a) (b) Labels: C+ C- y ∈ {c−,

    c+} Sample 1 Sample 2 Sample 3 Sample 4 Sample 1 Sample 2 Sample 3 Sample 4 … … Labeling data
  24. Neural network Goal: predictions on new samples: f : X

    → Y ˆ y = p(c+| MLAR sample) SGD with a binary cross- entropy loss: L = − n (yn ln(ˆ yn ) + (1 − yn ) ln(1 − ˆ yn ))
  25. Neural network Goal: predictions on new samples: f : X

    → Y ˆ y = p(c+| MLAR sample) SGD with a binary cross- entropy loss: L = − n (yn ln(ˆ yn ) + (1 − yn ) ln(1 − ˆ yn )) CLSTM Convolutions
  26. Injection of synthetic companions in an ADI sequence S/N=3.2 S/N=5.9

    S/N=1.3 S/N=2.7 SODINN’s output Performance assessment L2
  27. Next step Gomez Gonzalez in prep. • Cheaper labeled data

    generation strategies • Generative models for data augmentation • Exploitation of the temporal dimension • Deeper networks • Generalization (in the context of an HCI survey)