Applying Probabilistic Inference to Astronomical Spectroscopy #SciPy2020

Applying Probabilistic Inference to Astronomical Spectroscopy Michael Gully-Santiago, Research Fellow
The University of Texas at Austin Department of Astronomy SciPy2020 NOAO/AURA/NSF

NOAO/AURA/NSF The spectrum of a star

NOAO/AURA/NSF The spectrum of the Sun

NOAO/AURA/NSF The spectrum of a different star

NOAO/AURA/NSF Astronomical spectroscopy allows you to measure physical properties of
stars.

stars. Fundamental/Intrinsic Properties 1. Temperature 2. Surface Gravity 3. Atomic/Molecular composition

stars. Fundamental/Intrinsic Properties 1. Temperature 2. Surface Gravity 3. Atomic/Molecular composition Extrinsic/Kinematic Properties 1. Radial velocity 2. Projected Rotation Speed

stars. Fundamental/Intrinsic Properties 1. Temperature 2. Surface Gravity 3. Atomic/Molecular composition Special Properties 1. Mass accretion 2. Magnetic ﬁelds 3. Dust extinction 4. Starspots 5. Exoplanetary atmospheres 6. Laws of physics themselves Extrinsic/Kinematic Properties 1. Radial velocity 2. Projected Rotation Speed

stars. Fundamental/Intrinsic Properties 1. T eff 2. log g 3. [Fe/H], [α/Fe], C/O Special Properties 1. dm/dt 2. B 3. A V 4. f spot 5. R p /R star (λ) 6. log gf, etc. Extrinsic/Kinematic Properties 1. RV 2. v sin i .

NOAO/AURA/NSF Some of the best tools for astronomical spectral analysis
are built in Python.

github.com/jonathansick/awesome-astronomy (some) Python Tools for astronomical spectral analysis

- Physics driven model - Removes “telluric artifacts” from Earth’s
atmosphere - Python wrapper for Fortran-based LBLRTM TelFit github.com/kgullikson88/Telluric-Fitter Gullikson et al. 2014 SciPy 2015 Talk!

- Physics driven model - Removes “telluric artifacts” from Earth’s
atmosphere - Python wrapper for Fortran TelFit github.com/kgullikson88/Telluric-Fitter Gullikson et al. 2014

wobble github.com/megbedell/wobble - Data driven model (many spectra of same
star) - Removes Earth’s absorption spectrum - Yields Precision Radial Velocities - Built with TensorFlow Bedell et al. 2019

psoap github.com/iancze/PSOAP Czekala et al. 2017 - Data driven model
(many spectra of same star) - Assumes Earth signals are already removed - Yields orbit of Spectroscopic Binaries - Built with SciPy/cython

specmatch-emp github.com/samuelyeewl/specmatch-emp - Nearest Neighbor / Template matching model -
Has a large library of observed template spectra - Built with SciPy/AstroPy/Pandas Yee et al. 2017

specmatch-emp github.com/samuelyeewl/specmatch-emp - Nearest Neighbor / Template matching model -
Has a large library of observed template spectra - Built with SciPy/AstroPy/Pandas - Assumes all stars look like “normal” stars in your library Yee et al. 2017

New frontiers in astrophysics yield astronomical spectra that look unlike
anything we’ve seen before. → Templates are scarce/non-existent. → Ground-truth labelling is difﬁcult/impossible. We have to model our spectra based on astrophysical theory.

We want a Python function that takes in 1. A
noisy astronomical spectrum 2. A tunable theoretical model for how that spectrum could have been generated and outputs the-cloud-of-physical-properties-consistent-with-that-data

ﬁxed tunable probability cloud Czekala et al. 2015 Astronomical spectral
inference (analogous to retrievals in Earth Science Literature)

Why is spectral inference hard? 1. Physics-based models are expensive
to compute. 2. The models are imperfect. 3. The models have possibly many parameters. 4.

Why is spectral inference hard? 1. Physics-based models are expensive
to compute. 2. The models are imperfect. 3. The models have possibly many parameters. 4. Degeneracies among parameters give rise to similar spectra. 5. The data possess correlated noise (e.g. from Earth’s atmosphere). 6. The noise properties may not be perfectly known.

Due to the computational complexity, self-consistent synthetic spectral models are
pre-computed on coarsely sampled grids of physical properties. How do you turn a coarse grid into a smooth function? Husser et al. 2013

Starfish github.com/iancze/Starﬁsh Czekala et al. 2015 - Physics driven model
emulator - Fits all parameters simultaneously - Built with SciPy/cython/sklearn/ multiprocessing

Starfish makes spectral inference possible. 1. Physics-based models are expensive
to compute. 2. The models are imperfect. 3. The models have possibly many parameters. 4. Degeneracies among parameters give rise to similar spectra. 5. The data possess correlated noise (e.g. from Earth’s atmosphere). 6. The noise properties may not be perfectly known. Spectral emulation Gaussian Processes MCMC & Gibbs Sampling Local covariance kernels Noise scale inference

Spectral emulation quantiﬁes the discretization noise from interpolating coarse model
grids. - The interpolation occurs on the weights of PCA eigenspectra computed from the grid volume. - These weights tend to be smooth in the model parameters, giving a better reconstruction than linear interpolation of pixels. Czekala et al. 2015

Spectral emulation quantiﬁes the discretization noise from interpolating coarse model
grids. Emulation mitigates “piling up” at interpolated grid points (e.g. Cottaar et al. 2014) Mean reconstructed model Covariance matrix of each pixel Czekala et al. 2015

Gaussian Processes expect correlated residuals arising from a sea of
slightly-off line strengths. Non-stationary kernel downweights routine outliers. Instrumental noise alone underestimates residuals. Net effect of Gaussian Process is to avoid overﬁtting noise spikes. Starfish covariance matrix “Chi-squared” diagonal matrix Czekala et al. 2015

Sub-pixel resampling Convolves instrumental and astrophysical sources of line broadening
Enables reasonably precise Radial Velocity applications. Czekala et al. 2015

Three science cases enabled by Starﬁsh 1. Starspot physical properties
2. Photospheres of embedded protostars 3. Critically evaluating substellar atmospheres

1. Measuring starspot temperature and coverage area on a young
star ^Sunspots are seen on the Sun. Giant starspots confound fundamental properties and are difﬁcult to measure. Somers et al. 2015; Roettenbacher et al. 2016

We adapted Starfish to infer all the normal stellar parameters,
Plus: - Temperature of the spot, T spot - Coverage fraction of spots, f spot 14 total parameters ﬁt with ensemble sampling with emcee, chunking the IGRINS spectrum into 42 segments matched to spectral order; 21 segments shown here → github.com/BrownDwarf/welter

We ﬁnd ~70-85% coverage fraction of starspots on this extremely
spotted young star. The spot temperature is ~2700 K surrounded by ~4100 K ambient photosphere. Gully-Santiago et al. 2017

2. Measuring physical properties of a Class 0 protostar. ^Protostars
are shrouded in dust and difﬁcult to observe. We used ~8 hours of Keck time on a single protostar to measure its spectrum. Greene, Gully-Santiago, Barsony 2018 github.com/browndwarf/protostars

The spectrum is consistent with a large contracting protostar with
a ~1200 K disk possessing 4x the emitting area of protostar. github.com/browndwarf/protostars We added 4 new parameters to Starﬁsh: 1. Disk temperature 2. Disk emitting area 3. Extinction A K 4. Extinction power law Informs strategies for JWST.

3. Fundamental properties and physical chemistry of ultracool substars Gully-Santiago
et al. in prep. We’ve extended Starfish to Brown Dwarfs using the Sonora-Bobcat synthethic model grid (Marley et al. in prep) cf. github.com/gully/jammer-Gl570D A sea of molecules blanket the spectra of brown dwarfs making them difficult to interpret. Starfish enables retrieval-like analyses with physically self-consistent models.

Key limitations of Starfish and path forward 1. Signiﬁcant barriers
to entry have led to high interest but low adoption 2. Tuning the blocked Gibbs sampler is subtle and slow 3. Training the spectral emulator is computationally demanding and slow

to entry have led to high interest but low adoption 2. Tuning the blocked Gibbs sampler is subtle and slow 3. Training the spectral emulator is computationally demanding and slow 4. Physical extensions reside in undocumented forks 5. Not set up for auto-differentiation 6. No GPU acceleration

starﬁsh.readthedocs.io Major overhaul in v0.3.0! By Miles Lucas Graduate Student
at UHawaii

starﬁsh.readthedocs.io Major overhaul in v0.3.0! By Miles Lucas Graduate Student
at UHawaii New API design should encourage even more experimentation.

to entry have led to high interest but low adoption 2. Tuning the blocked Gibbs sampler is subtle and slow 3. Training the spectral emulator is computationally demanding and slow 4. Physical extensions reside in undocumented forks 5. Not set up for auto-differentiation 6. No GPU acceleration Addressed in v. 0.3!

to entry have led to high interest but low adoption 2. Tuning the blocked Gibbs sampler is subtle and slow 3. Training the spectral emulator is computationally demanding and slow 4. Physical extensions reside in undocumented forks 5. Not set up for auto-differentiation 6. No GPU acceleration Addressed in v. 0.3! Applying for NASA funding for support

Why are GPUs helpful? - The main bottleneck in Starﬁsh
is solving the N~1000 Gaussian Process likelihood. - We cannot use celerite* since the Starﬁsh noise matrix is non-stationary. With modern GPUs we can get to N~20,000 pixel spectra *Foreman-Mackey et al. 2017 github.com/dfm/celerite

Why is autodiff important? - MCMC Sampling in high dimensions
(10+ parameters) is difﬁcult. - Hamiltonian Monte Carlo (e.g. NUTS) overcomes this challenge by using exact gradients - Autodiff dramatically simpliﬁes writing physical extensions Hoffman & Gelman 2011 arxiv.org/abs/1111.4246 ^ Samples from a 250 dimensional correlated Multivariate Normal

Why is autodiff important? - MCMC Sampling in high dimensions
(10+ parameters) is difﬁcult. - Hamiltonian Monte Carlo (e.g. NUTS) overcomes this challenge by using exact gradients - Autodiff dramatically simpliﬁes writing physical extensions statmodeling.stat.columbia.edu/2017/03/15/ensemble-methods-doomed-fail-high-dimensions/ Emcee begins to fail for ten(s) of parameters

github.com/pyro-ppl/numpyro github.com/google/jax Jax and numpyro make it easy to write
models for Hamiltonian MC. - Forward and backward mode autodiff - CPU/GPU/TPU support out of the box

github.com/gully/TgiF/

You can get automatically get uncertainty contours by computing the
Hessian with exact autodiff in Jax.

github.com/BrownDwarf/ﬁatlux/ Proof-of-concept: numpyro, Jax, HAPI, and NVIDIA GPUs to constrain
the Temperature-Pressure proﬁle of Earth’s atmosphere. hitran.org/hapi/ It works! Ongoing demos: 20+ parameter models.

Key ideas 1. Excellent Python frameworks exist for Spectral Inference
2. Spectral emulation unlocks value from pre-computed synthetic grid models 3. Starfish has enabled new applications domains (starspots, brown dwarf physical chemistry, protostars) 4. Future promise of Jax/Numpyro: autodiff & GPUs will allow us to ask new questions at the scientiﬁc frontier

Thank you: ➔ Jill Cowan, Enthought, and SciPy2020 organizers and
sponsors ➔ Ian Czekala (UC Berkeley), Miles Lucas (U Hawaii), and Starﬁsh contributors ➔ Greg Herczeg (KIAA-Beijing), Tom Greene & Mark Marley (NASA Ames), Caroline Morley (UT Austin) for funding Starﬁsh development ➔ Austin Python Users Group, Beijing Python Meetup, SF Python Meetup

Applying Probabilistic Inference to Astronomica...

Applying Probabilistic Inference to Astronomical Spectroscopy #SciPy2020

More Decks by gully

Other Decks in Science

Featured

Transcript