Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data-driven models 
of the Milky Way 
in the Ga...

Boris Leistedt
November 30, 2017

Data-driven models 
of the Milky Way 
in the Gaia era

Boris Leistedt

November 30, 2017
Tweet

More Decks by Boris Leistedt

Other Decks in Research

Transcript

  1. Data-driven models 
 of the Milky Way 
 in the

    Gaia era Boris Leistedt — @ixkael, www.ixkael.com NASA Einstein Fellow, New York University
  2. Road map 1. Context
 The Milky Way and the Gaia

    mission 2. Interlude
 Hierarchical probabilistic models 3. Applications: Gaia DR1
 High-precision color-magnitude diagrams with Gaia
 Calibration of Red Clump star standard candles
 Evidence for unresolved binary and ternary sequences 4. The (near) future: Gaia DR2
  3. Happy collaborators Lauren Anderson 
 (Flatiron) David Hogg 
 (NYU/Flatiron)

    Keith Hawkins 
 (Columbia) Jo Bovy 
 (Toronto/Flatiron) Axel Widmark 
 (Stockholm) Adrian Price-Whelan (Princeton)
  4. Gaia sprints http://gaia.lol Full week of sprinting/hacking 
 on concrete

    achievable projects,
 in a room full of experts. - October 2016 in NYC
 - July 2017 in MPIA Heidelberg 
 - June 2018 in NYC Dozens of papers & new collaborations!
  5. The Gaia mission Successor to Hipparcos Micro-arcsecond global astrometry for

    1+ billion stars, complete to 20th mag: correlated positions, proper motions, parallaxes, apparent mags (3 broad photometric bands). 
 Approx 70 visits over a 5-year period. Radial velocities (NIR medium-res λ/ Δλ=11k integral-field spectrograph) down to GRVS ≈ 16 mag Powerful synergies with other surveys (2MASS, WISE, SDSS, etc) www.cosmos.esa.int/web/gaia/science-performance
  6. The numbers ‣ Catalogue: ∼ 1 billion stars; 0.34×106 to

    V = 10 mag; 26×106 to V = 15 mag; 250×106 to V = 18 mag; 1000 × 106 to V = 20 mag; complete to about 20 mag ‣ Sky density: mean density ∼25000 stars deg-2; max density ∼3×106 stars deg-2 ‣ Accuracies: median parallax errors: 7 μas at 10 mag; 20-25 μas at 15 mag; 200–300 μas at 20 mag ‣ Distance accuracies: from preliminary Galaxy model estimates: 3 million better than 1 per cent; 5 million better than 2 per cent; 10 million better than 5 per cent; 30 million better than 10 per cent ‣ Tangential velocity accuracies: from Galaxy models: 5 million better than 0.5 km s-1; 10 million better than 1 km s-1; 25 million better than 3 km s-1; 40 million better than 5 km s-1; 60 million better than 10 km s-1 ‣ Radial velocity accuracies: 1–10 km s-1 to V = 16 − 17 mag, depending on spectral type ‣ Photometry: to V = 20 mag in broadband light, and spectrally-dispersed light, with some 20 independent spectral samples between 330—1000 nm 
 Source: https://www.astro.umd.edu/~olling/Papers/GAIA3_IN_all_info_sheets.pdf
  7. Science goals 1 ‣ The Galaxy: tests of hierarchical structure

    formation models — star formation history — chemical evolution — inner bulge/bar dynamics — disc/halo interactions — dynamical evolution — nature of the warp — star cluster disruption — dynamics of spiral structure — distribution of dust — distribution of dark matter — detection of tidally disrupted debris — Galaxy rotation curve — disc mass profile ‣ Star formation and evolution: in situ luminosity function — dynamics of star forming regions — luminosity function for pre-main sequence stars — rapid evolutionary phases — complete and detailed local census down to single brown dwarfs — identification/ dating of oldest halo white dwarfs — age census — census of binaries and multiple stars ‣ Distance scale and reference frame: parallax calibration of all distance scale indicators — absolute luminosities of Cepheids — distance to the Magellanic Clouds — definition of the local, kinematically non-rotating metric Source: https://www.astro.umd.edu/~olling/Papers/GAIA3_IN_all_info_sheets.pdf
  8. ‣ Local Group and beyond: rotational parallaxes for Local Group

    galaxies — kinematical separation of stellar populations — galaxy orbits and cosmological history — zero proper motion quasar survey — cosmological acceleration of Solar System — photometry of galaxies — detection of supernovae ‣ Solar System: deep and uniform detection of minor planets — taxonomy and evolution — inner Trojans — Kuiper Belt Objects — disruption of Oort Cloud ‣ Extra-solar planetary systems: complete census of large planets to 200– 500 pc — orbital characteristics of several thousand systems ‣ Fundamental physics: γ to ∼ 5×10-7; β to 3×10-4−3×10-5; solar J 2 to 10-7−10-8; G ̇/G to 10-12 − 10-13 yr-1; constraints on gravitational wave energy for 10-12 < f < 4 × 10-9Hz; constraints on Ω M and Ω Λ from quasar microlensing ‣ Specific objects: 106 − 107 resolved galaxies; 105 extragalactic supernovae; 500 000 quasars; 105− 106(new) solar system objects; 50000 brown dwarfs; 3000 extra-solar planets; 200000 disc white dwarfs; 200 microlensed events; 107resolved binaries within 250 pc Science goals 2
  9. ‣ 3D stellar density and potential ‣ dynamics: full phase-space

    ‣ 3D dust and extinction law ‣ correlation between phase-space 
 & stellar parameters ‣ Robust to stellar models 
 => internal construction from Gaia data only (data-driven) My goals: detailed 3D+ Milky Way models
  10. Methodological challenges Correct and full exploitation of Gaia 
 =

    difficult regime for data analysis and inference ‣ Huge data set with heteroskedastic errors + selection effects 
 (e.g., magnitudes, parallaxes, proper motions) ‣ Constraining power of the data exceeds quality of existing physical models (e.g., 3D density, etc).
 Worse: using those models can bias the analysis. ‣ Let’s develop flexible “data-driven” models (e.g., non- parametric) which will inform physical models.
  11. ‣ Position and G magnitudes for all sources ‣ TGAS:

    astrometric solution (including parallaxes, proper motions, and G magnitude) for 2 million objects.
 Most have 2MASS and APASS magnitudes. Gaia Data Release 1 https://www.cosmos.esa.int/web/gaia/dr1
  12. Astrometric solution Right ascension Declination Tycho/
 Hypparcos Gaia DR1 Gaia

    DR2+ Parallax Proper motion => correlated magnitudes, parallaxes, and proper motions
  13. Stellar distances Broad, non-Gaussian distance pdf for parallax SNR<10…
 which

    is most of Gaia TGAS! How to improve the distances? 1 2 3 Parallax SNR = 8.8 4 6 Parallax SNR = 15 0 1 Parallax SNR = 1.5 0 1 Parallax SNR = 1.5 0.4 0.6 Distance 0.0 0.5 1.0 0.20 0.25 Distance 0.0 0.5 1.0 0 50 100 Distance 0.0 0.5 1.0 0 50 100 Distance 0.0 0.5 1.0 N(d 1 ˆ $; 2 $ )
  14. Gaia DR1 color-magnitude diagrams Improved stellar distance estimates Leistedt &

    Hogg, ApJ 2017 (arXiv:1703.08112)
 Data: Gaia TGAS cross-matched with APASS.
 Method: full hierarchical inference via Gibbs sampling. Anderson, Hogg, Leistedt, Price-Whelan, Bovy, ApJ 2017 (arXiv:1706.05055)
 Data: Gaia TGAS cross-matched with 2MASS.
 Method: deconvolution and empirical Bayes.
  15. TGAS-APASS data 1 0 1 2 3 4 5 6

    7 MV Data (point estimates) Data (subsample, with errors) 1 0 1 Model (posterior mean) Model (posterior stddev) 4 5 6 7 0.0 0.5 1.0 1.5 B V 1 0 1 2 3 4 5 6 7 MV Model (posterior mean) 0.0 0.5 1.0 1.5 B V Model (posterior stddev) MV = mV 5 log10 d 10 pc There is distance information in magnitudes!
  16. Heteroskedastic errors 0 50 100 color SNR 0 20 40

    parallax SNR 0 1000 2000 magnitude SNR 0 20 40 parallax SNR 0.0 0.5 1.0 1.5 B V 1 0 1 2 3 4 5 6 7 MV 10 20 30 40 50 Mean parallax SNR
  17. Stellar models 2 1 0 1 2 3 B V

    4 2 0 2 4 6 8 10 MV Few objects with errors 2 1 0 1 2 3 B V Density based on noisy data 2 1 0 1 2 3 B V Stellar models Let’s incorporate CMD information in distances
  18. P: parameter(s) of interest. D: data. M: model under consideration.

    Likelihood: probability of generating the data D with parameters P under model M. => Mechanism to forward-model data given the model or its parameters (without prior beliefs about their values) Priors: knowledge about parameters P under model M before looking at the data D. From theory, previous data, intuition, etc Posterior: Joint PDF on the N parameters of interest given the data D and under the model M
 Tedious to write/use for large &/or hierarchical models! Bayes theorem: p(P|D, M) | {z } posterior = p(D|P, M) | {z } likelihood ⇥ p(P|M) | {z } prior / p(D|M) | {z } evidence p ⇥1 = ✓1, · · · , ⇥N = ✓N D, M
  19. Distance information ‣ Uniform distance priors: p(d) = cst ‣

    Parallax information alone: ‣ Posterior distribution per object, for fixed color-magnitude models: Magnitude, color, and parallax & likelihoods p(d| ˆ $, {ft, Mt, Ct }, ˆ m, ˆ C) / X t ft N( ˆ m 5 log10 d Mt; 2 m ) N( ˆ C Ct; 2 C ) N(d 1 ˆ $; 2 $ ) Sum over stellar models N (d 1 ˆ $; 2 $ ) Having a CMD model improves distance estimates!
  20. A first look at parallaxes 0 1 2 0 2

    0 2 4 Parallax likelihood Posterior with stellar models Posterior with noisy CMD density 0.0 2.5 5.0 0 1 0 1 0.0 2.5 5.0 0 1 2 0 1 2 0 2 4 0 1 0 1 2 0 2 Parallax 0.0 2.5 5.0 Parallax 0 1 2 Parallax 0 2 4 Parallax
  21. The need for data-driven models Are stellar models accurate enough

    to deliver unbiased distances?
 Requires understanding selection effects, population models, etc.
 Can we construct a CMD model directly from the data to improve distance estimates? 1 0 1 2 3 4 5 6 7 MV Data (point estimates) Data (subsample, with errors) 1 0 Model (posterior mean) Model (posterior stddev) 1 0 1 2 3 4 5 6 7 MV Data (point estimates) Data (subsample, with errors) 0.0 0.5 1.0 1.5 B V 1 0 1 2 3 4 5 6 7 MV Model (posterior mean) 0.0 0.5 1.0 1.5 B V Model (posterior stddev)
  22. ‣ Gaussian distributions 
 + (homoskedastic) Gaussian noise ‣ Errors

    known. Want to estimate m and V. 
 Can use estimators (no need to access/estimate x’s) ‣ Simple case of deconvolution. But cannot write estimators with multiple Gaussians and heteroskedastic noise! A simple 1D analogy yi ⇠ N(xi, 2) ˆ µ = 1 N X i yi ˆ V + 2 = 1 N 1 X i (yi ˆ µ)2 xi ⇠ N(m, V )
  23. Probabilistic graphical models ‣ Circles = parameters/variables of interest, to

    be estimated ‣ Dots: fixed parameters/constants ‣ Arrows: direct dependency ‣ Plate: identical independent samples. ‣ Explicit representation of a model: captures parameters and dependencies. ‣ By writing all conditional distributions, one can write the full posterior. See book + coursera: Probabilistic Graphical Models, by D. Koller. m, V
  24. Hierarchical probabilistic models 101 ‣ Mixture model for density of

    true x’s (which are latent parameters)
 
 ‣ Heteroskedastic noise: ‣ Posterior distribution (=deconvolution!) p(xi |~ ↵, ~,~) = B X b=1 ↵b N(xi | b, 2 b ) p(yi |xi, i) = N(yi |xi, 2 i ) See http://ixkael.com for tutorials and code for 
 Bayesian hierarchical models, uncertainty shrinkage, selection effects, etc p(~ f, ~,~|{yi, i }) / p(~ f, ~,~) N Y i=1 B X b=1 fb Z dxi N(xi | b, 2 b )N(yi |xi, 2 i )
  25. TGAS PGM and model MV = mV 5 log10 d

    10 pc p( ˆ $|d, $) = N ( ˆ $ 1/d; 2 $ ) Absolute magnitude: Parallax & magnitude likelihoods: p( ˆ ~ m|d, ~ C, M, ⌃ ˆ ~ m ) = N ˆ ~ m ~ m(d, ~ C, M); ⌃ ˆ ~ m ˆ mi 3D Dust MW ri CMD ˆ mi Ci Ai Mi ˆ $i ˆ $i i = 1, · · · , N Posterior distribution now tying all objects together, with CMD
  26. Full CMD hierarchical model ‣ CMD = mixture model:
 


    Fixing means and widths on a grid, to convexify posterior
 Fixing dust reddening corrections at parallax point estimates ‣ MCMC with Gibbs sampling: bin amplitudes & allocations. 
 Distances marginalized over numerically. 
 True color + magnitude marginalized over analytically. p(M, C) = X b ↵b N(~ µb, ⌃b) Given bin allocations, draw amplitudes from Dirichlet Given amplitudes, draw bins from multinomial with
  27. 0.0 0.5 1.0 1.5 B V 1 0 1 2

    3 4 5 6 7 MV a b c d e f g h i j k l m n o p Model (posterior mean) a SNR: 4.8!6.0 b SNR: 3.5!5.2 c SNR: 2.9!4.0 d SNR: 4.5!6.6 e SNR: 3.3!4.1 f SNR: 4.1!5.1 g SNR: 4.4!6.0 h SNR: 2.1!3.1 i SNR: 4.0!5.1 j SNR: 3.7!5.6 k SNR: 5.0!6.4 0.00.51.01.52.02.5 Distance [kpc] l SNR: 2.1!3.4 parallax only hierarchical model 0.00.51.01.52.02.5 Distance [kpc] m SNR: 2.7!3.3 0.00.51.01.52.02.5 Distance [kpc] n SNR: 4.3!5.5 0.00.51.01.52.02.5 Distance [kpc] o SNR: 2.2!3.4 0.00.51.01.52.02.5 Distance [kpc] p SNR: 3.7!4.7 Hierarchical 
 uncertainty 
 shrinkage
  28. Hierarchical uncertainty shrinkage ‣ Natural consequence of hierarchical models: the

    inferred population distributions act as priors on the internal variables. 0.0 0.5 1.0 1.5 2.0 2.5 Mean distance (parallax only) [kpc] 0.0 0.5 1.0 1.5 2.0 2.5 Mean distance (hierarchical model) [kpc] 2.0 1.5 1.0 0.5 0.0 Log dist stddev (parallax only) [log kpc] 2.0 1.5 1.0 0.5 0.0 Log dist stddev (hierarchical model) [log kpc] 1.0 0.5 0.0 0.5 1.0 Scaled distance residual 0 10000 20000 30000 40000 50000 60000 70000 Nstars Parallax only Hierarchical model
  29. Gaia TGAS + 2MASS Can data scatter be explained by

    unresolved binaries (and triples?) For two identical stars, unresolved: M => M - 0.75 Msum = 2.5 log10 X i 10 0.5Mi Unresolved multiplet: = adding their fluxes
  30. Hierarchical model Errors in parallaxes + magnitudes. Density of single

    stars: Gaussian mixture on color-magnitude line.
 Parameters: widths and amplitudes. Density of doubles and triples 
 (assuming same distance): 
 Slow: simulation from single model 
 Fast: analytic first-order approximation from combinations of Gaussians. Infer fractional probs for binaries/triples Example: binary model for two single Gaussians Msum = 2.5 log10 X i 10 0.5Mi
  31. Predictions for single stars Predictions for individual properties, for DR2+!


    Releasing catalog of photometric candidates 
 Could be converted to masses using models. Future: connection to actual binary/trinary population models. Requires understanding Gaia selection function…
  32. Red clump stars ‣ Ubiquitous: low mass in core He-burning

    stage ‣ Striking feature in CMD ‣ Standard candles to probe distances, extinction, etc, in clusters/galaxies ‣ Problem: not perfect standard candle. Scatter (due to metallicity, ages, etc), + outliers in existing catalogs. ‣ Solution: data-driven model
  33. Modeling the photometric red clump Use existing spectroscopic or astroseismic

    RC catalogs
 Hierarchical probabilistic model: Gaussian for the RC + outliers, 
 marginalizing over dust, parallaxes, observed magnitudes. 0 1 2 3 4 G Ks 6 4 2 0 2 4 MKs ˆ $i / ˆ $i <0.30 APO1m Bovy APOKASC Laney ˆ mi Mi L ri MRC out ˆ mi ˆ $i ˆ $i R EB V Ai fout RC i = 1, · · · , N Model+MCMC with stan Sample joint posterior
  34. Results Most precise+robust calibrated RC absolute magnitudes: K band: −1.61±

    0.01 mag G band: 0.44±0.01 mag J band: −0.93±0.01 mag H band: −1.46±0.01 mag W1 band: −1.68±0.02 mag W2 band: −1.69±0.02 mag W3 band: −1.67±0.02 mag W4 band: -1.76±0.01 mag Intrinsic dispersion ∼0.17±0.03 mag Distance precision ∼8% from photometric information only Recovered Gaia parallax spatial systematic offset Next steps
 Use multicolor information to model metallicity and dust
  35. ‣ Google’s toolskit for linear algebra, 
 essentially covering numpy+scipy

    functionalities ‣ Build graph of data/operations. Also build graph of gradients with automatic/symbolic differentiation. ‣ Best optimizers + gradient tools on the market ‣ Interfaces with deep learning & probabilistic inference libraries. ‣ Great for optimization and modeling. Advanced inference/ sampling via external libraries such as Edward.
  36. Gaia DR2 (04/2018) 5-parameter astrometric solution + G/BP/RPmags for all

    sources = deep dynamic multi-color view of the Galaxy. + some radial velocities and stellar model fits.
 Great for 6D models and 3D dust maps
  37. The (near) future: full Gaia model This is a brief

    note describing a model of the positions, velocities, proper motions, and colors of stars. If the likelihood function of those is a multivariate Gaussian (which is the case for Gaia, with strong correlations between parallaxes and proper motions), one can adopte a Gaussian Mixture model for their distributions (joint or split) and analytically marginalize over the true velocity, and colors of each star. As a result, one only need to sample the parameters of the mixture model, as well as the distance and extinction of each star. The e↵ective likelihood function is a simple multivariate Gaussian, derived below. This opens the possibility to implement this model in fast inference/modelling languages like Tensorflow, Stan, or Edward. A summary of the notation is provided in the table below. Apologies if there are typos in the text or equations! ↵ = (↵1, · · · , ↵B ) All parameters of the mixture model ↵b = (fb, ⇠b , ⌃b ) Parameters of the b Gaussian of the mixture i Index of the ith star ni = (↵i, i ) True/observed angular position ri True distance vi = (vx,i, vy,i, vz,i ) True 3D cartesian velocity ˆ µi = (µ↵,i, µ ,i ) Observed proper motion ˆ $i Observed parallax Ei ! Emi , ECi True magnitude/color extinction at distance ri Ci, ˆ Ci True and observed color Mi True absolute magnitude ˆ mi Observed apparent magnitude 1 Model Our population/distribution model in 8-dimensional space (3D positions and velocities, plus 2D color–magnitude diagram) is a Gaussian mixture, [v n r C M]T ↵ ⇠ B X b=1 fb N8D ⇠b ; ⌃b . (1) The priors will be specified later. Typically, one would adopt conjugate priors which greatly simply the inference, i.e. a Dirichlet prior on the amplitudes {fb }, and multivariate Gaussian for each mean ⇠b , and Wishard for the covariance ⌃b . The full 5-dimensional likelihood, accounting for any covariance between the measurements, is [ˆ µ↵,i ˆ µ ,i ˆ $i ˆ Ci ˆ mi ]T vi, ni, ri, Ei, Ci, Mi ⇠ N5D ⇣ i ; i ⌘ (2) with the model vector 2 3 and proper motions), one can adopte a Gaussian Mixture model for their distributions (joint marginalize over the true velocity, and colors of each star. As a result, one only need to sam mixture model, as well as the distance and extinction of each star. The e↵ective likelihood funct Gaussian, derived below. This opens the possibility to implement this model in fast inference Tensorflow, Stan, or Edward. A summary of the notation is provided in the table below. Apologies if there are typos in t ↵ = (↵1, · · · , ↵B ) All parameters of the mix ↵b = (fb, ⇠b , ⌃b ) Parameters of the b Gauss i Index of the ith star ni = (↵i, i ) True/observed angular po ri True distance vi = (vx,i, vy,i, vz,i ) True 3D cartesian velocity ˆ µi = (µ↵,i, µ ,i ) Observed proper motion ˆ $i Observed parallax Ei ! Emi , ECi True magnitude/color ext Ci, ˆ Ci True and observed color Mi True absolute magnitude ˆ mi Observed apparent magni 1 Model Our population/distribution model in 8-dimensional space (3D positions and velocities, plus 2D is a Gaussian mixture, [v n r C M]T ↵ ⇠ B X b=1 fb N8D ⇠b ; ⌃b . The priors will be specified later. Typically, one would adopt conjugate priors which greatly s Dirichlet prior on the amplitudes {fb }, and multivariate Gaussian for each mean ⇠b , and Wish The full 5-dimensional likelihood, accounting for any covariance between the measurements Gaussian mixture model: Infer the distributions from the data (here in 8D) Analytic or numerical marginalization of latent parameters.
  38. Summary Gaia: exciting data set, but analysis is challenging. Hierarchical

    data-driven models for fully exploiting all of the data without external models. Gaia DR1: 
 high-precision color-magnitude diagrams
 improved stellar distances
 binary/triple sequences 
 red-clump calibration Gaia DR2 (April 2018): 
 3D reconstruction of stellar density, dust, and velocities.