Slide 1

Slide 1 text


Slide 2

Slide 2 text

OUTLINE 1. Introduction 2. State-of-the-art differential imaging post- processing techniques 3.Supervised learning applied to HCI 4. Conclusions

Slide 3

Slide 3 text


Slide 4

Slide 4 text

4 THREE DECADES DETECTING EXOPLANETS PSRB1257+12 b,c 51 Peg b HD 209458 b 2MASSW J1207334-393254 b HR8799 b,c,d HR8799 e, beta Pic b 51 Eri b, 25 Jan 2018

Slide 5

Slide 5 text

5 THREE DECADES DETECTING EXOPLANETS PSRB1257+12 b,c 51 Peg b HD 209458 b 2MASSW J1207334-393254 b HR8799 b,c,d HR8799 e, beta Pic b 51 Eri b, 25 Jan 2018

Slide 6

Slide 6 text

6 INDIRECT DETECTION METHODS Radial velocity Transit Mayor and Queloz 1995 Charbonneau et al. 2000

Slide 7

Slide 7 text

7 Well not really, directly imaged exoplanets look like this: Macintosh et al. 2015

Slide 8

Slide 8 text

8 HR8799 bcde (Marois et al. 2008-2010)

Slide 9

Slide 9 text

9 Beta Pictoris b (Lagrange 2009)

Slide 10

Slide 10 text

POWER OF DIRECT OBSERVATIONS 10 Milli et al. 2016 Konopacky et al. 2013 Bowler 2016 Marois et al. 2010 HR8799, L’ band 20 AU 0.5” b c d e

Slide 11

Slide 11 text

11 DIRECT IMAGING IS CHALLENGING (1) High (planet to star) contrast: 10-6 to 10-10 (2) Angular separation (3) Image degradation

Slide 12

Slide 12 text

GROUND-BASED EXOPLANET IMAGING 12 SPHERE, Vigan et al. 2015 Very Large Telescope (VLT)

Slide 13

Slide 13 text

GROUND-BASED HCI 13 Seeing limited image Improving angular resolution + reducing the contrast and dynamic range AO corrected image Coronagraphic image Post-processed image Coronagraphy Wavefront control Observing techniques Image post- processing

Slide 14

Slide 14 text


Slide 15

Slide 15 text


Slide 16

Slide 16 text

16 SEA OF SPECKLES Keck/NIRC2 image sequence videoclip videoclip

Slide 17

Slide 17 text

STATE-OF-THE-ART IMAGE PROCESSING FOR HCI 17 Basic calibration and “cosmetics” • Dark/bias subtraction • Flat fielding • Sky (thermal background) subtraction • Bad pixel correction Raw astronomical images Final residual image Image recentering • Center of mass • 2d Gaussian fit • DFT cross-correlation Bad frames removal • Image correlation • Pixel statistics (specific image regions) Reference PSF creation • Pairwise • Median • PCA, NMF • LOCI • LLSG Image combination • Mean, median, trimmed mean PSF reference subtraction De-rotation (for ADI) or rescaling (for mSDI) Characterization of detected companions

Slide 18

Slide 18 text

VORTEX IMAGE PROCESSING (VIP) LIBRARY • VIP: open-source python library for reproducible and robust data reduction, providing a wide collection of pre- and post-processing algorithms for HCI data processing • Three observing techniques: angular, reference-star, and multi-spectral differential imaging • Mature ADI processing. RDI and mSDI are work in progress 18 Gomez Gonzalez et al. 2017

Slide 19

Slide 19 text

• 50k+ lines of code, 1+7 contributors • 279 commits, 64 PRs, 48 closed issues, 12 releases • Growing community of users • > 10 papers published/submitted citing VIP • Documentation: + Jupyter tutorial • Open-science & reproducibility (Jupyter workflows/ pipelines) 19 Gomez Gonzalez et al. 2017 VORTEX IMAGE PROCESSING (VIP) LIBRARY

Slide 20

Slide 20 text

STATE-OF-THE-ART IMAGE PROCESSING FOR HCI 20 Basic calibration and “cosmetics” • Dark/bias subtraction • Flat fielding • Sky (thermal background) subtraction • Bad pixel correction Raw astronomical images Final residual image Image recentering • Center of mass • 2d Gaussian fit • DFT cross-correlation Bad frames removal • Image correlation • Pixel statistics (specific image regions) Reference PSF creation • Pairwise • Median • PCA, NMF • LOCI • LLSG Image combination • Mean, median, trimmed mean PSF reference subtraction De-rotation (for ADI) or rescaling (for mSDI) Characterization of detected companions Let’s focus on these stages: • Model PSF subtraction • Detection • Performance assessment • Characterization

Slide 21

Slide 21 text

21 Angular differential imaging Ai TIME B = median(Ai) Ci = Ai - B Di = de-rotation(Ci) E = median(Di) Marois et al. 2006 MODEL PSF SUBTRACTION: MEDIAN FRAME

Slide 22

Slide 22 text

MODEL PSF SUBTRACTION: LOCI 22 Ai Bi = loci_approx(Ai) Ci = Ai - Bi Di = de-rotation(Ci) E = median(Di) TIME Lafreniere et al. 2007

Slide 23

Slide 23 text

MODEL PSF SUBTRACTION: PCA 23 Ai PCA Bi = pca_approx(Ai) Ci = Ai - Bi Di = de-rotation(Ci) E = median(Di) TIME Low-rank approximation Basis truncation Soummer et al. 2012, Amara & Quanz 2012

Slide 24

Slide 24 text

MODEL PSF SUBTRACTION: ADI-NMF 24 Non-negative matrix factorization (NMF) for ADI: Gomez Gonzalez et al. 2017 Non-negative components Principal components

Slide 25

Slide 25 text

OTHER OBSERVING TECHNIQUES RDI, SDI 25 n x w x p x p n - number of frames w - number of λ Annular RDI-PCA + standardization + frame correlation Multi-stage PCA for multiple-channel SDI + ADI S/N map Reference datasets Spectrally dispersed datasets

Slide 26

Slide 26 text

26 STATE-OF-THE-ART DETECTION VLT/NACO ADI sequence Final residual frame #±%!&@% speckles!!! ? ? ? ? ? videoclip

Slide 27

Slide 27 text

27 7.8 7.0 planet speckle Planet and speckle show a different behavior when increasing the # of PCs STATE-OF-THE-ART DETECTION Mawet et al. 2014

Slide 28

Slide 28 text

28 Corresponding S/N maps STATE-OF-THE-ART DETECTION Many ways of obtaining final residual images

Slide 29

Slide 29 text

29 STATE-OF-THE-ART PERFORMANCE ASSESSMENT (planet-to-star) Contrast Standard deviation of the flux in resolution elements

Slide 30

Slide 30 text

30 STATE-OF-THE-ART PERFORMANCE ASSESSMENT Star photometry PSF template (planet-to-star) Contrast

Slide 31

Slide 31 text

31 STATE-OF-THE-ART PERFORMANCE ASSESSMENT • 50% completeness as a function of the separation • Strong assumption about noise statistics (related to the FPR) • Not the best tool for assessing the performance of detection algorithms

Slide 32

Slide 32 text

(R, Theta, Flux) estimation by optimizing a function of merit computed on an aperture in the residual frame(s) 32 CHARACTERIZATION WITH THE NEGFC TECHNIQUE 29 -22 -16 -8.7 -2 4.8 12 18 25 32 3 0".3 E N 29 -22 -16 -8.7 -2 4.8 12 18 25 32 3 Lagrange et al. 2010, Marois et al. 2010 Wertz et al 2016

Slide 33

Slide 33 text

LLSG • Low-rand plus sparse decomposition applied to HCI • Local Low-rank plus Sparse plus Gaussian noise (LLSG) decomposition for ADI sequences • Based on SSGoDec (Zhou 2011, Zhou & Tao 2013) 33 Gomez Gonzalez et al. 2016

Slide 34

Slide 34 text

LLSG 34 S/N ~17 S/N ~51 Gomez Gonzalez et al. 2016 soft thresh

Slide 35

Slide 35 text

PERFORMANCE ASSESSMENT 35 Gomez Gonzalez et al. 2016 H1 H0 Detection TP FP Null result FN TN TPR FPR

Slide 36

Slide 36 text

DICTIONARY LEARNING • Dictionary learning for generalizing the task of image approximation (reference PSF) in terms of a ”basis” 36 Dictionary • X are overlapping patches from the reference frames

Slide 37

Slide 37 text

(c) • Orthogonal Matching Pursuit: • T is a matrix of vectorized overlapping patches from the target frames SPARSE CODING 37 Patch Reconstruction Residuals Dictionary (b) (a) • k as function of the separation Linear combination of k atoms - =

Slide 38

Slide 38 text

3. SUPERVISED LEARNING APPLIED TO HCI a.k.a. detecting exoplanets isn’t about residual images after all

Slide 39

Slide 39 text

PC 1 PC 2 MACHINE LEARNING IN A NUTSHELL Construction of algorithms that can learn from and make predictions on data 39 Unsupervised Supervised Regression Classification Dimensionality reduction Clustering

Slide 40

Slide 40 text

40 “Essentially, all models are wrong, but some are useful.” George Box “…if the model is going to be wrong anyway, why not see if you can get the computer to ‘quickly’ learn a model from the data, rather than have a human laboriously derive a model from a lot of thought.” Peter Norvig

Slide 41

Slide 41 text

SUPERVISED LEARNING • The goal is to learn a function that maps the input samples to the labels given a labeled dataset : • Two types of problems: classification (y is a finite set of classes/categories) and regression (y is a real value) 41 min f∈F 1 n n i=1 L(yi, f(xi )) + λΩ(f) f : X → Y, (xi, yi )i=1,...,n Goodfellow et al. 2016

Slide 42

Slide 42 text

NEURAL NETWORKS 42 Perceptron Activation functions step sigmoid tanh ReLU Rosenblatt 1958

Slide 43

Slide 43 text

DEEP NEURAL NETWORKS 43 Input X 1st Layer (data transformation) 2nd Layer (data transformation) Nth Layer (data transformation) … Predictions Y’ Input labels Y Loss function weights weights weights Optimizer loss score weight update • DNNs can be understood as a composition of simple linear operations and non- linearities • Layered representations Forward and backward passes f (x) = σ k (A k σ k−1 (A k−1 ...σ 2 (A 2 σ 1 (A 1 x))...))

Slide 44

Slide 44 text

SUPERVISED DETECTION OF EXOPLANETS 44 N x Pann k SVD low-rank approximation levels k residuals, back to image space X : MLAR samples 0 1 Convolutional LSTM layer kernel=(3x3), filters=40 Convolutional LSTM layer kernel=(2x2), filters=80 Dense layer units=128 Output dense layer units=1 3d Max pooling size=(2x2x2) 3d Max pooling size=(2x2x2) ReLU activation + dropout Sigmoid activation X and y to train/test/validation sets Probability of positive class MLAR patches Binary map probability threshold = 0.9 Trained classifier PSF Input cube, N frames Input cube y : Labels … … (a) (b) (c) Gomez Gonzalez et al. 2018 SODINN schematic representation

Slide 45

Slide 45 text

45 N x Pann k SVD low-rank approximation levels k residuals, back to image space X : MLAR samples 0 1 Convolutional LSTM layer kernel=(3x3), filters=40 Convolutional LSTM layer kernel=(2x2), filters=80 Dense layer units=128 Output dense layer units=1 3d Max pooling size=(2x2x2) 3d Max pooling size=(2x2x2) ReLU activation + dropout Sigmoid activation X and y to train/test/validation sets Probability of positive class MLAR patches Binary map probability threshold = 0.9 Trained classifier PSF Input cube, N frames Input cube y : Labels … … (a) (b) (c) SUPERVISED DETECTION OF EXOPLANETS ??? n No labeled HCI data! Single ADI dataset. Using n calibrated images: • low S/N • n too small …

Slide 46

Slide 46 text

46 N x Pann k SVD low-rank approximation levels k residuals, back to image space X : MLAR samples 0 1 Convolutional LSTM layer kernel=(3x3), filters=40 Convolutional LSTM layer kernel=(2x2), filters=80 Dense layer units=128 Output dense layer units=1 3d Max pooling size=(2x2x2) 3d Max pooling size=(2x2x2) ReLU activation + dropout Sigmoid activation X and y to train/test/validation sets Probability of positive class MLAR patches Binary map probability threshold = 0.9 Trained classifier PSF Input cube, N frames Input cube y : Labels … … (a) (b) (c) SUPERVISED DETECTION OF EXOPLANETS ??? m No labeled HCI data! m ADI data cubes (survey): • better S/N • m needs to be large enough …

Slide 47

Slide 47 text

47 N x Pann k SVD low-rank approximation levels k residuals, back to image space X : MLAR samples 0 1 Convolutional LSTM layer kernel=(3x3), filters=40 Convolutional LSTM layer kernel=(2x2), filters=80 Dense layer units=128 Output dense layer units=1 3d Max pooling size=(2x2x2) 3d Max pooling size=(2x2x2) ReLU activation + dropout Sigmoid activation X and y to train/test/validation sets Probability of positive class MLAR patches Binary map probability threshold = 0.9 Trained classifier PSF Input cube, N frames Input cube y : Labels … … (a) (b) (c) GENERATING A LABELED DATASET Explained variance ratio: Multi-level Low-rank Approximation Residual (MLAR) samples Step 1 ˆ σj 2 i ˆ σi 2 M ∈ Rn×p M = UΣV T = n i=1 σiuivT i res = M − MBT k Bk

Slide 48

Slide 48 text

LABELED DATASET 48 (a) (b) (a) (b) Labels: C+ C- y ∈ {c−, c+} Sample 1 Sample 2 Sample 3 Sample 4 Sample 1 Sample 2 Sample 3 Sample 4 … …

Slide 49

Slide 49 text

N x Pann k SVD low-rank approximation levels k residuals, back to image space X : MLAR samples 0 1 Convolutional LSTM layer kernel=(3x3), filters=40 Convolutional LSTM layer kernel=(2x2), filters=80 Dense layer units=128 Output dense layer units=1 3d Max pooling size=(2x2x2) 3d Max pooling size=(2x2x2) ReLU activation + dropout Sigmoid activation X and y to train/test/validation sets Probability of positive class MLAR patches Binary map probability threshold = 0.9 Trained classifier PSF Input cube, N frames Input cube y : Labels … … (a) (b) (c) SODIRF: Random forest SODINN: convolutional LSTM deep neural network TRAINING A DISCRIMINATIVE MODEL 49 Step 2 Goal - to make predictions on new samples: SODINN: Stochastic gradient descent optimizer Binary cross- entropy cost function Training a classifier f : X → Y ˆ y = p(c+| MLAR sample) SODINN

Slide 50

Slide 50 text

N x Pann k SVD low-rank approximation levels k residuals, back to image space X : MLAR samples 0 1 Convolutional LSTM layer kernel=(3x3), filters=40 Convolutional LSTM layer kernel=(2x2), filters=80 Dense layer units=128 Output dense layer units=1 3d Max pooling size=(2x2x2) 3d Max pooling size=(2x2x2) ReLU activation + dropout Sigmoid activation X and y to train/test/validation sets Probability of positive class MLAR patches Binary map probability threshold = 0.9 Trained classifier PSF Input cube, N frames Input cube y : Labels … … (a) (b) (c) MAKING PREDICTIONS 50 Step 3 LBT/LMIRcam HR8799 ˆ y = p(c+| MLAR sample)

Slide 51

Slide 51 text

TEST WITH INJECTED COMPANIONS 51 Injection of 4 fake companions in VLT/SPHERE V471 Tau ? 4 companions you say… ? ?

Slide 52

Slide 52 text

52 (a) (b) (c) (d) Injection of 4 fake companions in VLT/SPHERE V471 Tau S/N=3.2 S/N=5.9 S/N=1.3 S/N=2.7 TEST WITH INJECTED COMPANIONS oh…..

Slide 53

Slide 53 text

53 (a) (b) (c) (d) Injection of 4 fake companions in VLT/SPHERE V471 Tau SODINN’s output S/N=3.2 S/N=5.9 S/N=1.3 S/N=2.7 TEST WITH INJECTED COMPANIONS

Slide 54

Slide 54 text

54 PERFORMANCE ASSESSMENT Good classifier True positive True Negative Threshold False Negative False Positive Observations Bad classifier Behavior of a binary classifier in a signal detection theory context

Slide 55

Slide 55 text

PERFORMANCE ASSESSMENT 55 (a) (b) (c) no detection | 1 FP no detection | 0 FP detection | 0 FP detection | 0 FP detection | 0 FP (a) (b) (c) no detection | 1 FP no detection | 0 FP detection | 0 FP detection | 0 FP detection | 0 FP

Slide 56

Slide 56 text

56 (a) (b) (c) no detection | 1 FP no detection | 0 FP detection | 0 FP detection | 0 FP detection | 0 FP PERFORMANCE ASSESSMENT (b) (c) (d) no detection | 1 FP no detection | 0 FP detection | 0 FP detection | 0 FP detection | 0 FP no detection | 76 FP detection | 81 FP detection | 81 FP detection | 97 FP detection | 5 FP (e) no detection | 4 FP no detection | 4 FP detection | 7 FP detection | 5 FP detection | 2 FP

Slide 57

Slide 57 text


Slide 58

Slide 58 text


Slide 59

Slide 59 text


Slide 60

Slide 60 text

60 PERSPECTIVES • Construction of a benchmark HCI datasets • Community and Kaggle-like data challenges • Improving SODINN: • lighter DNN architecture • include flux and sub-px into the model • extend to ADI+SDI • avoid patches (on full images) • new (cheaper) labeled data generation strategy • extended structures (disks) • Application of more ML methods to HCI • Detecting through HCI an Earth-like exoplanet

Slide 61

Slide 61 text

Transforming science: • Cross and inter-disciplinary research (collaboration with CS, ML, AI fields) • Ensuring the use of robust statistical approaches and well- suited metrics • Integrating cutting-edge AI developments • Code release (open-source development) • Knowledge sharing (non-refereed publications) • Data challenges & benchmark datasets (ACADEMIC) DATA SCIENCE 61

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text carlgogo carlosalbertogomezgonzalez ¡Gracias!