GPUs for astronomy data

GPUs for astronomy data Michael Gully-Santiago, PhD Scientist, baeri.org Kepler/K2
Guest Observer Ofﬁce April 25, 2018 NVIDIA, Santa Clara, CA

What I want to say • (essentially all) Astronomical data
has correlated noise • New statistical approaches for overcoming correlated noise have recently become available and are now popular in the astronomy community • These approaches involve linear algebra manipulations amenable to GPU acceleration • Two barriers have prevented widespread adoption: access to (NVIDIA) hardware, and the ﬁnite learning curve of programming (CUDA) • Recent Python frameworks are lowering the barrier to programming GPUs

How to deal with correlated noise 6 4 2 0
2 4 6 4 3 2 1 0 1 2 3 4 y = m x + b 0 10 20 30 40 0 10 20 30 40 The true covariance of the observations. Slides from Dan Foreman-Mackey https://speakerdeck.com/dfm/an-astronomers-introduction-to-gaussian-processes-v2

Linear least-squares. A = 2 6 6 6 4 x1
1 x2 1 . . . . . . xn 1 3 7 7 7 5 C = 2 6 6 6 4 2 1 0 · · · 0 0 2 2 · · · 0 . . . . . . ... . . . 0 0 · · · 2 n 3 7 7 7 5 y = 2 6 6 6 4 y1 y2 . . . yn 3 7 7 7 5  m b = S AT C 1 y S = ⇥ AT C 1 A ⇤ 1 maximum likelihood & in this case only mean of posterior posterior covariance 6 4 2 0 2 4 6 4 3 2 1 0 1 2 3 4 Before. You get biased estimates of m,b when you ignore covariance. Ignoring the covariance! Slides from Dan Foreman-Mackey https://speakerdeck.com/dfm/an-astronomers-introduction-to-gaussian-processes-v2

6 4 2 0 2 4 6 4 3 2
1 0 1 2 3 4 After. Including the covariance yields realistic error bars. Linear least-squares. A = 2 6 6 6 4 x1 1 x2 1 . . . . . . xn 1 3 7 7 7 5 C = 2 6 6 6 4 2 1 0 · · · 0 0 2 2 · · · 0 . . . . . . ... . . . 0 0 · · · 2 n 3 7 7 7 5 y = 2 6 6 6 4 y1 y2 . . . yn 3 7 7 7 5  m b = S AT C 1 y S = ⇥ AT C 1 A ⇤ 1 maximum likelihood & in this case only mean of posterior posterior covariance 0 10 20 30 40 0 10 20 30 40 Slides from Dan Foreman-Mackey https://speakerdeck.com/dfm/an-astronomers-introduction-to-gaussian-processes-v2

Inverting the C matrix is very computationally expensive. In practice,
can only be done on small (N~ 3000) data. Scales poorly (naively O(N3)) Cholesky factorization helps Applied mathematics research has developed approximate solvers George and celerité have improved scaling on CPUs. Problem is even more acute if inversion occurs in a repeated likelihood function call for optimization or sampling. That's where GPUs come in. https://github.com/dfm/george https://github.com/dfm/celerite

How to (easily) program NVIDIA GPUs with Python.

PyTorch just got even better with v. 0.4.0 - Includes
much needed Multivariate Normal distribution - Looks more like numpy yesterday!

Demo of solving y = m x + b with
covariance matrix. Three ways: 1. CPU with numpy 2. CPU with PyTorch 3. GPU with PyTorch I used the: NASA Advanced Supercomputing (NAS) High End Computing Capability (HECC) Pleiades System based at NASA Ames One 16-core Sandy Bridge GPU-Enhanced Node possessing an NVIDIA Tesla K40 (GPU) and CUDA/8.0.

A = 2 6 6 6 4 x1 1 x2
1 . . . . . . xn 1 3 7 7 7 5 C = 2 6 6 6 4 2 1 0 · · · 0 0 2 2 · · · 0 . . . . . . ... . . . 0 0 · · · 2 n 3 7 7 7 5 y = 2 6 6 6 4 y1 y2 . . . yn 3 7 7 7 5  m b = S AT C 1 y S = ⇥ AT C 1 A ⇤ 1 maximum likelihood & in this case only mean of posterior posterior covariance 0 10 20 30 40 0 10 20 30 40

GPUs outperform CPUs by 18x at N=3200 Simulating the noise
limits CPUs to N=3200 Michael Gully-Santiago, using NASA Ames Pleiades NVIDIA K40 on one Sandy Bridge node

Simulating the noise requires drawing from one huge multivariate normal
(aka Gaussian distribution) GPUs outperform CPUs by 75x on this task at C=3200 x 3200 matrix. We can simulate 23000 x 23000 matrix before we run out of memory on the NVIDIA K40. https://github.com/gully/bombcat

Kaggle Kernels are hosted, social Jupyter notebooks. They oﬀer NVIDIA
K80 GPUs https://www.kaggle.com/xgully/ﬁt-a-line-to-data-with-gpus/ Resources

Dan Foreman-Mackey (Flatiron Center for Computational Astrophysics) has pioneered many
methods for modern astrophysical statistical inference. Resources https://speakerdeck.com/dfm/pyastro16 https://speakerdeck.com/dfm/ an-astronomers-introduction-to-gaussian-processes-v2 https://speakerdeck.com/dfm/data-analysis-with-mcmc http://dfm.io/ https://github.com/dfm/tf-tutorial

What I said • (essentially all) Astronomical data has correlated
noise • New statistical approaches for overcoming correlated noise have recently become available and are now popular in the astronomy community • These approaches involve linear algebra manipulations amenable to GPU acceleration • Two barriers have prevented widespread adoption: access to (NVIDIA) hardware, and the ﬁnite learning curve of programming (CUDA) • Recent Python frameworks are lowering the barrier to programming GPUs

extras

Examples of correlated noise in astronomy • Kepler high precision
time-series brightness measurements of variable stars • Model-data comparison of stellar spectroscopy • Thermal-induced noise in optical calibration systems • Galaxy surface brightness distributions • Apparent bulk motion of gas blobs on roiling stellar surfaces • Placement of any points on a chart when systematic bias is present

GPUs for astronomy data

GPUs for astronomy data

gully

More Decks by gully

Other Decks in Science

Featured

Transcript

GPUs for astronomy data Michael Gully-Santiago, PhD Scientist, baeri.org Kepler/K2

What I want to say • (essentially all) Astronomical data

How to deal with correlated noise 6 4 2 0

Linear least-squares. A = 2 6 6 6 4 x1

6 4 2 0 2 4 6 4 3 2

Inverting the C matrix is very computationally expensive. In practice,

How to (easily) program NVIDIA GPUs with Python.

PyTorch just got even better with v. 0.4.0 - Includes

Demo of solving y = m x + b with

A = 2 6 6 6 4 x1 1 x2

GPUs outperform CPUs by 18x at N=3200 Simulating the noise

Simulating the noise requires drawing from one huge multivariate normal

Kaggle Kernels are hosted, social Jupyter notebooks. They oﬀer NVIDIA

Dan Foreman-Mackey (Flatiron Center for Computational Astrophysics) has pioneered many

What I said • (essentially all) Astronomical data has correlated

extras

Examples of correlated noise in astronomy • Kepler high precision