Slide 1

Slide 1 text

Exoplanet direct imaging data challenge Carlos Alberto Gomez Gonzalez Paris-Saclay Center for Data Science, 11/04/2018

Slide 2

Slide 2 text

Grenoble Alpes Data Institue ■ WP1: Data Science for Earth, Space and Environmental Sciences ■ WP2: Data Science for Life Sciences ■ WP3: Massive and Rich Data for Humanities ■ WP4: Data Science, Social Media and Social Sciences ■ WP5: Data Governance, Data Protection and Privacy WP0: Coordination MSTIC - Mathematics, Information and Communication Sciences and Technologies CBS - Chemistry, Biology, Health PAGE - Particle physics, Astrophysics, Geosciences, Environment and ecology SHS - Humanities and Social Sciences PSS - Social Sciences USMB – Univ Savoie Mont Blanc 19 Labs involved that cover 5 research domains IDEX Cross disciplinary Program • 1.7 million euros • From 2017 to 2020

Slide 3

Slide 3 text

Grenoble Alpes Data Institue • Data Challenge: • epigenetic & High-dimension Mediation Data Challenge, • audio-visual diarization, • cancer research, • home-made framework (codalab / jupyter)? • Data Science in the Alpes (March 20) • R in Grenoble group. • PySciDataGre group. • Data club. • Data Carpentry.

Slide 4

Slide 4 text

What do I do • Interdisciplinary research. • Exoplanetary science and astrophysics with CS & ML. • Integrating cutting-edge ML developments. • Ensuring the use of robust statistical approaches and well-suited metrics. • Open-source development. • Data challenges.

Slide 5

Slide 5 text

Mostly, we rely on indirect methods for detecting exoplanets … it’s very hard to “see” them

Slide 6

Slide 6 text

Credit: NASA, http://planetquest.jpl.nasa.gov ANIMATION

Slide 7

Slide 7 text

Milli et al. 2016 Konopacky et al. 2013 Bowler 2016 Marois et al. 2010 HR8799, L’ band 20 AU 0.5” b c d e Power of direct imaging

Slide 8

Slide 8 text

SPHERE, Vigan et al. 2015 Very Large Telescope (VLT), Chile

Slide 9

Slide 9 text

Basic calibration and “cosmetics” • Dark/bias subtraction • Flat fielding • Sky or thermal background subtraction • Bad pixel correction Raw astronomical images Detection on final residual image Image recentering Bad frames removal PSF modeling • Median • Pairwise, ANDROMEDA • LOCI • PCA, NMF • LLSG Image combination Model PSF subtraction De-rotation (ADI) or rescaling (mSDI) Characterization of detected companions Sequence of calibrated images

Slide 10

Slide 10 text

calib. im ages planet Angular differential imaging - bright synthetic planet Speckle noise ANIMATION

Slide 11

Slide 11 text

HR8799 bcde (Marois et al. 2008-2010) On of the lucky cases! Final images after post-processing (several epochs) post- proc. ANIMATION

Slide 12

Slide 12 text

• Available on Pypi • https://github.com/vortex-exoplanet/VIP • Documentation (Sphynx): http://vip.readthedocs.io/ • Jupyter tutorial • Python 2/3 compatibility • Continuous integration (Travis CI) and automated testing (Pytest) Gomez Gonzalez et al. 2017 Vortex Image Processing library

Slide 13

Slide 13 text

Marois et al. 2007 Gomez Gonzalez et al. 2016 Lafrenière et al. 2007 A lgo- ZO O Marois et al. 2007 Soummer et al. 2012 Amara & Quanz 2012 Absil et al. 2013 Gomez Gonzalez et al. 2017 Gomez Gonzalez et al. 2016 Gomez Gonzalez et al. 2016 Marois et al. 2014 Marois et al. 2014 Hagelberg et al. 2015 Mugnier et al. 2009 Cantalloube et al. 2015

Slide 14

Slide 14 text

Open Science & reproducibility Open source

Slide 15

Slide 15 text

“Today, software is to scientific research what Galileo’s telescope was to astronomy: a tool, combining science and engineering. It lies outside the central field of principal competence among the researchers that rely on it. … it builds upon scientific progress and shapes our scientific vision.” Pradal et al. 2015

Slide 16

Slide 16 text

Image sequence Detection Final residual image ? ? ? ? ? ? ? Speckles (?) Real planet Synth. planet

Slide 17

Slide 17 text

ANIMATION

Slide 18

Slide 18 text

“Essentially, all models are wrong, but some are useful.” George Box “…if the model is going to be wrong anyway, why not see if you can get the computer to ‘quickly’ learn a model from the data, rather than have a human laboriously derive a model from a lot of thought.” Peter Norvig

Slide 19

Slide 19 text

PC 1 PC 2 Unsupervised Supervised Regression Classification Dimensionality reduction Clustering ML in a nutshell Density estimation and reinforcement learning

Slide 20

Slide 20 text

The goal is to learn a mapping from the input samples to the labels: given a labeled dataset : Supervised learning Goodfellow et al. 2016

Slide 21

Slide 21 text

Deep neural networks

Slide 22

Slide 22 text

Supervised detection of exoplanets Gomez Gonzalez et al. 2018

Slide 23

Slide 23 text

Model PSF subtraction combined residuals Supervised detection (SODINN) noisy and unlabelled images data transformation + adequate (ML) model higher sensitivity

Slide 24

Slide 24 text

Data-driven performance assessment

Slide 25

Slide 25 text

No need to reinvent the wheel :)

Slide 26

Slide 26 text

• Small committee takes care of most of the planing. • Main organizer takes care of logistics/ leaderboard. • Main organizer writes a review-type paper. • Community effort. • Using a robust framework for data challenges creation. • Hands-on sessions. • Workshop for analyzing results and learning from different approaches. Old school Open science Data challenges

Slide 27

Slide 27 text

• Low detection rate so far - observational bias or reality? • Several instruments/surveys with large databases. • New instruments coming online in the next years. • ~13 years of image processing techniques. • Discovering new techniques! Exoplanet DI challenge Motivation

Slide 28

Slide 28 text

• Benchmark datasets — Metrics — Sub-challenges. • Finding the right platform (RAMP?). • Avoid excluding “old” code/pipelines: IDL, MATLAB, etc. Exoplanet DI challenge Planning: • benchmark datasets • sub-challenges • metrics Kick-off one-day session (RAMP?) Submission of results • Final leaderboard • Comparison of results/ approaches Workshop: image processing for exoplanet DI

Slide 29

Slide 29 text

• Possible sub-challenges. • Each observing technique: different data format/dimensionality, variability to exploit. • Detection & characterization. • Focus on exoplanets (excluding disks?). Exoplanet DI challenge ADI (NACO, SPHERE/ IRDIS, NIRC2, LMIRcam, etc) ADI + mSDI (SPHERE/IFS, GPI, etc) RDI, other techniques?

Slide 30

Slide 30 text

¡Gracias! [email protected] carlgogo carlosalbertogomezgonzalez https://carlgogo.github.io/