Slide 1

Slide 1 text

Accurate, interpretable photometric redshifts with Gaussian Processes encoding physics in machine learning algorithms Boris Leistedt — @ixkael, www.ixkael.com NASA Einstein Fellow @ CCPP, New York University

Slide 2

Slide 2 text

observational systematics are the next frontier accuracy (good methods) precision (good data)

Slide 3

Slide 3 text

Rich space of models (early universe, gravity, particles, dark matter, etc) and observables (galaxy clustering, lensing, etc) Galaxy Surveys

Slide 4

Slide 4 text

experimental landscape

Slide 5

Slide 5 text

spectroscopic SEDs types redshifts shallow no shear

Slide 6

Slide 6 text

CCD images deep shear no types no redshifts photometric

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

spatial systematics
 Almost resolved! See Elsner, Leistedt & Peiris: 
 arXiv:1609.03577, 1509.08933, 1507.05647, 1404.6530 photometric redshifts intrinsic alignments covariance matrices, blending, etc methodological & theoretical breakthroughs needed Imaging surveys : challenges

Slide 9

Slide 9 text

20 billion galaxies 17 billion stars 7 trillion sources detected
 in single epochs 30 trillion forced photometry 10 million alerts per nigh

Slide 10

Slide 10 text

photometric redshifts

Slide 11

Slide 11 text

Redshift: doppler shift of electromagnetic radiation due to expansion of the universe = indication of distance 0.0 0.5 1.0 1.5 2.0 Redshift z 0 1000 2000 3000 4000 5000 6000 Comoving distance [Mpc] 0.0 0.5 1.0 1.5 2.0 Redshift z 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Clumpiness of matter, 8 f⌫( obs , z) = (1 + z) 4⇡D2 L (z) L⌫ ✓ obs (1 + z) ◆

Slide 12

Slide 12 text

animation

Slide 13

Slide 13 text

DES SV data 
 (arXiv:1507.05909) KIDS data (arXiv:1606.05338) State of the art Ongoing surveys don’t meet photo-z requirements

Slide 14

Slide 14 text

physical model probabilistic need template set hard to capture data complexity sensitive to priors template fitting template set (CWW) likelihood function p({ ˆ Fb }|z, t) = Y b N( ˆ Fb, Fmod b (z, t), ˆ Fb )

Slide 15

Slide 15 text

machine learning captures data complexity very flexible no physical model, 
 solves for flux=>z, 
 cannot extrapolate not probabilistic requires representative training data

Slide 16

Slide 16 text

Will never have representative spectroscopic data Galaxy SED models are not precise enough Only deep spectroscopic & many-band surveys available True PDFs needed with data and model uncertainties Machine learning constrained by physics of the problem?

Slide 17

Slide 17 text

Data-driven, interpretable photometric redshifts trained on heterogeneous and unrepresentative data arXiv:1612.00847 with David Hogg (NYU)

Slide 18

Slide 18 text

Concept: implicitly fitting and redshifting SEDs to each training galaxy for pairwise comparison with target galaxies
 = machine learning + template fitting Probabilistic, physical, and data driven
 Interpretable model & PDFs. Flexibility via parameters. Use much more data than existing methods: heterogeneous combination of spectroscopic or deeper photometric data Fast to (re-)train/apply. No need to store tabulated PDFs. NEW METHOD: DELIGHTTM Leistedt & Hogg (arXiv:1612.00847) — github.com/ixkael/Delight

Slide 19

Slide 19 text

Target set: photometric survey Training set: many-band or spectroscopic set 
 = deeper, heterogeneous version of target No complete physical model for galaxy spectra => construct spectra compatible with training set training galaxies ‘target’ galaxy p(z|{ ˆ Fb }) / Z dt p({ ˆ Fb }|z, t) p(z, t) = X i wi p({Fb }|z, ti) p(z|{ ˆ Fb }) / Z dt p({ ˆ Fb }|z, t) p = X i wi p({Fb }|z, ti) Idea:

Slide 20

Slide 20 text

The crazy intractable way Explore all SEDs compatible with training galaxy (noisy fluxes + spec-z) via MCMC Fit fluxes with explicit SED, indirectly predict fluxes at other redshift

Slide 21

Slide 21 text

The elegant efficient way Directly fit for training galaxy in flux-redshift space + force the fit to correspond to underlying SEDs Fit fluxes with latent SED, directly predict fluxes at other redshift

Slide 22

Slide 22 text

Slide 23

Slide 23 text

characterized by mean and kernel m ( ~ x ) = E[ f ( ~ x )] k ( ~ x, ~ x 0) = E[( f ( ~ x ) m ( ~ x ))( f ( ~ x 0) m ( ~ x 0))] f ⇠ GP () p ( f ( ~ x ) , f ( ~ x 0)) is Gaussian 8 ~ x, ~ x 0 Gaussian processes for Gaussian likelihood, posterior/predictions tractable see Rasmussen & Williams (2006)

Slide 24

Slide 24 text

Fitting with GPs = using priors over functions Modelling correlated signal and/or noise Choice of kernel is key (captures correlations)

Slide 25

Slide 25 text

Slide 26

Slide 26 text

GP with physical mean function and residuals Fitting and predicting photometric fluxes while capturing the physics of redshifts Analytically tractable under simple assumptions F(b, z) ⇠ GP ⇣ µF (b, z), kF (b, b0, z, z0) ⌘ L⌫( ) ⇠ GP ⇣X k ↵kTk ⌫ ( ), k( , 0) ⌘ if SED model is: then the fluxes: templates residuals mean flux and covariance Photo-z gaussian process

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

G10 / COSMOS data training: deep SUBARU/HST bands with spectroscopic redshifts target: ugriz SDSS bands
 
 training/target: 10k/10k objects

Slide 29

Slide 29 text

unrepresentative training set with different bands & noise

Slide 30

Slide 30 text

a closer look at two PDFs…

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

Conclusions Imaging surveys diverse science: fundamental physics, astrophysics systematics limited — require exquisite photo-z’s DELIGHT — GITHUB.COM/IXKAEL/DELIGHT data-driven method with physics & machine learning delivers accurate, interpretable redshifts probabilities What’s next? robust redshifts with deep, diverse training sets generative model for galaxy fluxes, redshifts, & types