Data-driven models  of the Milky Way  in the Gaia era

Data-driven models   of the Milky Way   in the
Gaia era Boris Leistedt — @ixkael, www.ixkael.com NASA Einstein Fellow, New York University

Road map 1. Context  The Milky Way and the Gaia
mission 2. Interlude  Hierarchical probabilistic models 3. Applications: Gaia DR1  High-precision color-magnitude diagrams with Gaia  Calibration of Red Clump star standard candles  Evidence for unresolved binary and ternary sequences 4. The (near) future: Gaia DR2

Happy collaborators Lauren Anderson   (Flatiron) David Hogg   (NYU/Flatiron)
Keith Hawkins   (Columbia) Jo Bovy   (Toronto/Flatiron) Axel Widmark   (Stockholm) Adrian Price-Whelan (Princeton)

Gaia sprints http://gaia.lol Full week of sprinting/hacking   on concrete
achievable projects,  in a room full of experts. - October 2016 in NYC  - July 2017 in MPIA Heidelberg   - June 2018 in NYC Dozens of papers & new collaborations!

The Gaia mission Successor to Hipparcos Micro-arcsecond global astrometry for
1+ billion stars, complete to 20th mag: correlated positions, proper motions, parallaxes, apparent mags (3 broad photometric bands).   Approx 70 visits over a 5-year period. Radial velocities (NIR medium-res λ/ Δλ=11k integral-ﬁeld spectrograph) down to GRVS ≈ 16 mag Powerful synergies with other surveys (2MASS, WISE, SDSS, etc) www.cosmos.esa.int/web/gaia/science-performance

The numbers ‣ Catalogue: ∼ 1 billion stars; 0.34×106 to
V = 10 mag; 26×106 to V = 15 mag; 250×106 to V = 18 mag; 1000 × 106 to V = 20 mag; complete to about 20 mag ‣ Sky density: mean density ∼25000 stars deg-2; max density ∼3×106 stars deg-2 ‣ Accuracies: median parallax errors: 7 μas at 10 mag; 20-25 μas at 15 mag; 200–300 μas at 20 mag ‣ Distance accuracies: from preliminary Galaxy model estimates: 3 million better than 1 per cent; 5 million better than 2 per cent; 10 million better than 5 per cent; 30 million better than 10 per cent ‣ Tangential velocity accuracies: from Galaxy models: 5 million better than 0.5 km s-1; 10 million better than 1 km s-1; 25 million better than 3 km s-1; 40 million better than 5 km s-1; 60 million better than 10 km s-1 ‣ Radial velocity accuracies: 1–10 km s-1 to V = 16 − 17 mag, depending on spectral type ‣ Photometry: to V = 20 mag in broadband light, and spectrally-dispersed light, with some 20 independent spectral samples between 330—1000 nm   Source: https://www.astro.umd.edu/~olling/Papers/GAIA3_IN_all_info_sheets.pdf

Science goals 1 ‣ The Galaxy: tests of hierarchical structure
formation models — star formation history — chemical evolution — inner bulge/bar dynamics — disc/halo interactions — dynamical evolution — nature of the warp — star cluster disruption — dynamics of spiral structure — distribution of dust — distribution of dark matter — detection of tidally disrupted debris — Galaxy rotation curve — disc mass profile ‣ Star formation and evolution: in situ luminosity function — dynamics of star forming regions — luminosity function for pre-main sequence stars — rapid evolutionary phases — complete and detailed local census down to single brown dwarfs — identification/ dating of oldest halo white dwarfs — age census — census of binaries and multiple stars ‣ Distance scale and reference frame: parallax calibration of all distance scale indicators — absolute luminosities of Cepheids — distance to the Magellanic Clouds — definition of the local, kinematically non-rotating metric Source: https://www.astro.umd.edu/~olling/Papers/GAIA3_IN_all_info_sheets.pdf

‣ Local Group and beyond: rotational parallaxes for Local Group
galaxies — kinematical separation of stellar populations — galaxy orbits and cosmological history — zero proper motion quasar survey — cosmological acceleration of Solar System — photometry of galaxies — detection of supernovae ‣ Solar System: deep and uniform detection of minor planets — taxonomy and evolution — inner Trojans — Kuiper Belt Objects — disruption of Oort Cloud ‣ Extra-solar planetary systems: complete census of large planets to 200– 500 pc — orbital characteristics of several thousand systems ‣ Fundamental physics: γ to ∼ 5×10-7; β to 3×10-4−3×10-5; solar J 2 to 10-7−10-8; G ̇/G to 10-12 − 10-13 yr-1; constraints on gravitational wave energy for 10-12 < f < 4 × 10-9Hz; constraints on Ω M and Ω Λ from quasar microlensing ‣ Speciﬁc objects: 106 − 107 resolved galaxies; 105 extragalactic supernovae; 500 000 quasars; 105− 106(new) solar system objects; 50000 brown dwarfs; 3000 extra-solar planets; 200000 disc white dwarfs; 200 microlensed events; 107resolved binaries within 250 pc Science goals 2

‣ 3D stellar density and potential ‣ dynamics: full phase-space
‣ 3D dust and extinction law ‣ correlation between phase-space   & stellar parameters ‣ Robust to stellar models   => internal construction from Gaia data only (data-driven) My goals: detailed 3D+ Milky Way models

Methodological challenges Correct and full exploitation of Gaia   =
difficult regime for data analysis and inference ‣ Huge data set with heteroskedastic errors + selection effects   (e.g., magnitudes, parallaxes, proper motions) ‣ Constraining power of the data exceeds quality of existing physical models (e.g., 3D density, etc).  Worse: using those models can bias the analysis. ‣ Let’s develop flexible “data-driven” models (e.g., non- parametric) which will inform physical models.

Gaia Data Release 1

‣ Position and G magnitudes for all sources ‣ TGAS:
astrometric solution (including parallaxes, proper motions, and G magnitude) for 2 million objects.  Most have 2MASS and APASS magnitudes. Gaia Data Release 1 https://www.cosmos.esa.int/web/gaia/dr1

Astrometric solution Right ascension Declination Tycho/  Hypparcos Gaia DR1 Gaia
DR2+ Parallax Proper motion => correlated magnitudes, parallaxes, and proper motions

Stellar distances Broad, non-Gaussian distance pdf for parallax SNR<10…  which
is most of Gaia TGAS! How to improve the distances? 1 2 3 Parallax SNR = 8.8 4 6 Parallax SNR = 15 0 1 Parallax SNR = 1.5 0 1 Parallax SNR = 1.5 0.4 0.6 Distance 0.0 0.5 1.0 0.20 0.25 Distance 0.0 0.5 1.0 0 50 100 Distance 0.0 0.5 1.0 0 50 100 Distance 0.0 0.5 1.0 N(d 1 ˆ $; 2 $ )

Gaia DR1 color-magnitude diagrams Improved stellar distance estimates Leistedt &
Hogg, ApJ 2017 (arXiv:1703.08112)  Data: Gaia TGAS cross-matched with APASS.  Method: full hierarchical inference via Gibbs sampling. Anderson, Hogg, Leistedt, Price-Whelan, Bovy, ApJ 2017 (arXiv:1706.05055)  Data: Gaia TGAS cross-matched with 2MASS.  Method: deconvolution and empirical Bayes.

TGAS-APASS data 1 0 1 2 3 4 5 6
7 MV Data (point estimates) Data (subsample, with errors) 1 0 1 Model (posterior mean) Model (posterior stddev) 4 5 6 7 0.0 0.5 1.0 1.5 B V 1 0 1 2 3 4 5 6 7 MV Model (posterior mean) 0.0 0.5 1.0 1.5 B V Model (posterior stddev) MV = mV 5 log10 d 10 pc There is distance information in magnitudes!

Heteroskedastic errors 0 50 100 color SNR 0 20 40
parallax SNR 0 1000 2000 magnitude SNR 0 20 40 parallax SNR 0.0 0.5 1.0 1.5 B V 1 0 1 2 3 4 5 6 7 MV 10 20 30 40 50 Mean parallax SNR

Stellar models 2 1 0 1 2 3 B V
4 2 0 2 4 6 8 10 MV Few objects with errors 2 1 0 1 2 3 B V Density based on noisy data 2 1 0 1 2 3 B V Stellar models Let’s incorporate CMD information in distances

P: parameter(s) of interest. D: data. M: model under consideration.
Likelihood: probability of generating the data D with parameters P under model M. => Mechanism to forward-model data given the model or its parameters (without prior beliefs about their values) Priors: knowledge about parameters P under model M before looking at the data D. From theory, previous data, intuition, etc Posterior: Joint PDF on the N parameters of interest given the data D and under the model M  Tedious to write/use for large &/or hierarchical models! Bayes theorem: p(P|D, M) | {z } posterior = p(D|P, M) | {z } likelihood ⇥ p(P|M) | {z } prior / p(D|M) | {z } evidence p ⇥1 = ✓1, · · · , ⇥N = ✓N D, M

Distance information ‣ Uniform distance priors: p(d) = cst ‣
Parallax information alone: ‣ Posterior distribution per object, for ﬁxed color-magnitude models: Magnitude, color, and parallax & likelihoods p(d| ˆ $, {ft, Mt, Ct }, ˆ m, ˆ C) / X t ft N( ˆ m 5 log10 d Mt; 2 m ) N( ˆ C Ct; 2 C ) N(d 1 ˆ $; 2 $ ) Sum over stellar models N (d 1 ˆ $; 2 $ ) Having a CMD model improves distance estimates!

A ﬁrst look at parallaxes 0 1 2 0 2
0 2 4 Parallax likelihood Posterior with stellar models Posterior with noisy CMD density 0.0 2.5 5.0 0 1 0 1 0.0 2.5 5.0 0 1 2 0 1 2 0 2 4 0 1 0 1 2 0 2 Parallax 0.0 2.5 5.0 Parallax 0 1 2 Parallax 0 2 4 Parallax

The need for data-driven models Are stellar models accurate enough
to deliver unbiased distances?  Requires understanding selection eﬀects, population models, etc.  Can we construct a CMD model directly from the data to improve distance estimates? 1 0 1 2 3 4 5 6 7 MV Data (point estimates) Data (subsample, with errors) 1 0 Model (posterior mean) Model (posterior stddev) 1 0 1 2 3 4 5 6 7 MV Data (point estimates) Data (subsample, with errors) 0.0 0.5 1.0 1.5 B V 1 0 1 2 3 4 5 6 7 MV Model (posterior mean) 0.0 0.5 1.0 1.5 B V Model (posterior stddev)

‣ Gaussian distributions   + (homoskedastic) Gaussian noise ‣ Errors
known. Want to estimate m and V.   Can use estimators (no need to access/estimate x’s) ‣ Simple case of deconvolution. But cannot write estimators with multiple Gaussians and heteroskedastic noise! A simple 1D analogy yi ⇠ N(xi, 2) ˆ µ = 1 N X i yi ˆ V + 2 = 1 N 1 X i (yi ˆ µ)2 xi ⇠ N(m, V )

Probabilistic graphical models ‣ Circles = parameters/variables of interest, to
be estimated ‣ Dots: ﬁxed parameters/constants ‣ Arrows: direct dependency ‣ Plate: identical independent samples. ‣ Explicit representation of a model: captures parameters and dependencies. ‣ By writing all conditional distributions, one can write the full posterior. See book + coursera: Probabilistic Graphical Models, by D. Koller. m, V

Hierarchical probabilistic models 101 ‣ Mixture model for density of
true x’s (which are latent parameters)    ‣ Heteroskedastic noise: ‣ Posterior distribution (=deconvolution!) p(xi |~ ↵, ~,~) = B X b=1 ↵b N(xi | b, 2 b ) p(yi |xi, i) = N(yi |xi, 2 i ) See http://ixkael.com for tutorials and code for   Bayesian hierarchical models, uncertainty shrinkage, selection eﬀects, etc p(~ f, ~,~|{yi, i }) / p(~ f, ~,~) N Y i=1 B X b=1 fb Z dxi N(xi | b, 2 b )N(yi |xi, 2 i )

TGAS PGM and model MV = mV 5 log10 d
10 pc p( ˆ $|d, $) = N ( ˆ $ 1/d; 2 $ ) Absolute magnitude: Parallax & magnitude likelihoods: p( ˆ ~ m|d, ~ C, M, ⌃ ˆ ~ m ) = N ˆ ~ m ~ m(d, ~ C, M); ⌃ ˆ ~ m ˆ mi 3D Dust MW ri CMD ˆ mi Ci Ai Mi ˆ $i ˆ $i i = 1, · · · , N Posterior distribution now tying all objects together, with CMD

Full CMD hierarchical model ‣ CMD = mixture model:   
Fixing means and widths on a grid, to convexify posterior  Fixing dust reddening corrections at parallax point estimates ‣ MCMC with Gibbs sampling: bin amplitudes & allocations.   Distances marginalized over numerically.   True color + magnitude marginalized over analytically. p(M, C) = X b ↵b N(~ µb, ⌃b) Given bin allocations, draw amplitudes from Dirichlet Given amplitudes, draw bins from multinomial with

Results: error-deconvolved HRD

0.0 0.5 1.0 1.5 B V 1 0 1 2
3 4 5 6 7 MV a b c d e f g h i j k l m n o p Model (posterior mean) a SNR: 4.8!6.0 b SNR: 3.5!5.2 c SNR: 2.9!4.0 d SNR: 4.5!6.6 e SNR: 3.3!4.1 f SNR: 4.1!5.1 g SNR: 4.4!6.0 h SNR: 2.1!3.1 i SNR: 4.0!5.1 j SNR: 3.7!5.6 k SNR: 5.0!6.4 0.00.51.01.52.02.5 Distance [kpc] l SNR: 2.1!3.4 parallax only hierarchical model 0.00.51.01.52.02.5 Distance [kpc] m SNR: 2.7!3.3 0.00.51.01.52.02.5 Distance [kpc] n SNR: 4.3!5.5 0.00.51.01.52.02.5 Distance [kpc] o SNR: 2.2!3.4 0.00.51.01.52.02.5 Distance [kpc] p SNR: 3.7!4.7 Hierarchical   uncertainty   shrinkage

Hierarchical uncertainty shrinkage ‣ Natural consequence of hierarchical models: the
inferred population distributions act as priors on the internal variables. 0.0 0.5 1.0 1.5 2.0 2.5 Mean distance (parallax only) [kpc] 0.0 0.5 1.0 1.5 2.0 2.5 Mean distance (hierarchical model) [kpc] 2.0 1.5 1.0 0.5 0.0 Log dist stddev (parallax only) [log kpc] 2.0 1.5 1.0 0.5 0.0 Log dist stddev (hierarchical model) [log kpc] 1.0 0.5 0.0 0.5 1.0 Scaled distance residual 0 10000 20000 30000 40000 50000 60000 70000 Nstars Parallax only Hierarchical model

Anderson et al Empirical Bayes + deconvolution What about using
Gaia G magnitudes?

Evidence for double/triple sequences Ongoing work led by A. Widmark,
with D. Hogg

Gaia TGAS + 2MASS Can data scatter be explained by
unresolved binaries (and triples?) For two identical stars, unresolved: M => M - 0.75 Msum = 2.5 log10 X i 10 0.5Mi Unresolved multiplet: = adding their ﬂuxes

Hierarchical model Errors in parallaxes + magnitudes. Density of single
stars: Gaussian mixture on color-magnitude line.  Parameters: widths and amplitudes. Density of doubles and triples   (assuming same distance):   Slow: simulation from single model   Fast: analytic ﬁrst-order approximation from combinations of Gaussians. Infer fractional probs for binaries/triples Example: binary model for two single Gaussians Msum = 2.5 log10 X i 10 0.5Mi

Results Data scatter well explained by unresolved binaries & triples.

Predictions for single stars Predictions for individual properties, for DR2+! 
Releasing catalog of photometric candidates   Could be converted to masses using models. Future: connection to actual binary/trinary population models. Requires understanding Gaia selection function…

Red clump star calibration Hawkins, Leistedt, Bovy & Hogg, MNRAS
2017  arXiv:1705.08988

Red clump stars ‣ Ubiquitous: low mass in core He-burning
stage ‣ Striking feature in CMD ‣ Standard candles to probe distances, extinction, etc, in clusters/galaxies ‣ Problem: not perfect standard candle. Scatter (due to metallicity, ages, etc), + outliers in existing catalogs. ‣ Solution: data-driven model

Modeling the photometric red clump Use existing spectroscopic or astroseismic
RC catalogs  Hierarchical probabilistic model: Gaussian for the RC + outliers,   marginalizing over dust, parallaxes, observed magnitudes. 0 1 2 3 4 G Ks 6 4 2 0 2 4 MKs ˆ $i / ˆ $i <0.30 APO1m Bovy APOKASC Laney ˆ mi Mi L ri MRC out ˆ mi ˆ $i ˆ $i R EB V Ai fout RC i = 1, · · · , N Model+MCMC with stan Sample joint posterior

Results Most precise+robust calibrated RC absolute magnitudes: K band: −1.61±
0.01 mag G band: 0.44±0.01 mag J band: −0.93±0.01 mag H band: −1.46±0.01 mag W1 band: −1.68±0.02 mag W2 band: −1.69±0.02 mag W3 band: −1.67±0.02 mag W4 band: -1.76±0.01 mag Intrinsic dispersion ∼0.17±0.03 mag Distance precision ∼8% from photometric information only Recovered Gaia parallax spatial systematic oﬀset Next steps  Use multicolor information to model metallicity and dust

Multi-color-magnitude diagrams

Multicolor CMD Again, hierarchical Gaussian mixture model with parallax+magnitude errors. 
This time, simple optimization, not interested in errors.

Multicolor CMD

Multicolor CMD Next steps: 3D dust map modeling.

‣ Google’s toolskit for linear algebra,   essentially covering numpy+scipy
functionalities ‣ Build graph of data/operations. Also build graph of gradients with automatic/symbolic diﬀerentiation. ‣ Best optimizers + gradient tools on the market ‣ Interfaces with deep learning & probabilistic inference libraries. ‣ Great for optimization and modeling. Advanced inference/ sampling via external libraries such as Edward.

Gaia DR2

Gaia DR2 (04/2018)

Gaia DR2 (04/2018) 5-parameter astrometric solution + G/BP/RPmags for all
sources = deep dynamic multi-color view of the Galaxy. + some radial velocities and stellar model ﬁts.  Great for 6D models and 3D dust maps

The (near) future: full Gaia model This is a brief
note describing a model of the positions, velocities, proper motions, and colors of stars. If the likelihood function of those is a multivariate Gaussian (which is the case for Gaia, with strong correlations between parallaxes and proper motions), one can adopte a Gaussian Mixture model for their distributions (joint or split) and analytically marginalize over the true velocity, and colors of each star. As a result, one only need to sample the parameters of the mixture model, as well as the distance and extinction of each star. The e↵ective likelihood function is a simple multivariate Gaussian, derived below. This opens the possibility to implement this model in fast inference/modelling languages like Tensorflow, Stan, or Edward. A summary of the notation is provided in the table below. Apologies if there are typos in the text or equations! ↵ = (↵1, · · · , ↵B ) All parameters of the mixture model ↵b = (fb, ⇠b , ⌃b ) Parameters of the b Gaussian of the mixture i Index of the ith star ni = (↵i, i ) True/observed angular position ri True distance vi = (vx,i, vy,i, vz,i ) True 3D cartesian velocity ˆ µi = (µ↵,i, µ ,i ) Observed proper motion ˆ $i Observed parallax Ei ! Emi , ECi True magnitude/color extinction at distance ri Ci, ˆ Ci True and observed color Mi True absolute magnitude ˆ mi Observed apparent magnitude 1 Model Our population/distribution model in 8-dimensional space (3D positions and velocities, plus 2D color–magnitude diagram) is a Gaussian mixture, [v n r C M]T ↵ ⇠ B X b=1 fb N8D ⇠b ; ⌃b . (1) The priors will be specified later. Typically, one would adopt conjugate priors which greatly simply the inference, i.e. a Dirichlet prior on the amplitudes {fb }, and multivariate Gaussian for each mean ⇠b , and Wishard for the covariance ⌃b . The full 5-dimensional likelihood, accounting for any covariance between the measurements, is [ˆ µ↵,i ˆ µ ,i ˆ $i ˆ Ci ˆ mi ]T vi, ni, ri, Ei, Ci, Mi ⇠ N5D ⇣ i ; i ⌘ (2) with the model vector 2 3 and proper motions), one can adopte a Gaussian Mixture model for their distributions (joint marginalize over the true velocity, and colors of each star. As a result, one only need to sam mixture model, as well as the distance and extinction of each star. The e↵ective likelihood funct Gaussian, derived below. This opens the possibility to implement this model in fast inference Tensorflow, Stan, or Edward. A summary of the notation is provided in the table below. Apologies if there are typos in t ↵ = (↵1, · · · , ↵B ) All parameters of the mix ↵b = (fb, ⇠b , ⌃b ) Parameters of the b Gauss i Index of the ith star ni = (↵i, i ) True/observed angular po ri True distance vi = (vx,i, vy,i, vz,i ) True 3D cartesian velocity ˆ µi = (µ↵,i, µ ,i ) Observed proper motion ˆ $i Observed parallax Ei ! Emi , ECi True magnitude/color ext Ci, ˆ Ci True and observed color Mi True absolute magnitude ˆ mi Observed apparent magni 1 Model Our population/distribution model in 8-dimensional space (3D positions and velocities, plus 2D is a Gaussian mixture, [v n r C M]T ↵ ⇠ B X b=1 fb N8D ⇠b ; ⌃b . The priors will be specified later. Typically, one would adopt conjugate priors which greatly s Dirichlet prior on the amplitudes {fb }, and multivariate Gaussian for each mean ⇠b , and Wish The full 5-dimensional likelihood, accounting for any covariance between the measurements Gaussian mixture model: Infer the distributions from the data (here in 8D) Analytic or numerical marginalization of latent parameters.

Summary Gaia: exciting data set, but analysis is challenging. Hierarchical
data-driven models for fully exploiting all of the data without external models. Gaia DR1:   high-precision color-magnitude diagrams  improved stellar distances  binary/triple sequences   red-clump calibration Gaia DR2 (April 2018):   3D reconstruction of stellar density, dust, and velocities.

Data-driven models of the Milky Way in the Ga...

Data-driven models of the Milky Way in the Gaia era

More Decks by Boris Leistedt

Other Decks in Research

Featured

Transcript

Data-driven models  of the Milky Way  in the Ga...

Data-driven models  of the Milky Way  in the Gaia era