Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Radio astronomy data reduction at PyData Amsterdam

Gijs Molenaar
October 12, 2016

Radio astronomy data reduction at PyData Amsterdam

Gijs Molenaar

October 12, 2016
Tweet

More Decks by Gijs Molenaar

Other Decks in Science

Transcript

  1. ABOUT ME • Scientific Software Engineer • 50 / 50

    Amsterdam / Cape Town • PhD student South Africa • Focus on end user (astronomer) experience • Machine Learning • Brewer
  2. PYDATA • What to talk about? • How we get

    our data • How we use Python to crunch this data • Python projects i’m involved in
  3. THE PROBLEMS • We can’t directly observe the sky •

    Sampling in the fourier domain generates massive amounts of data • Our observation is incomplete • reconstruction is more art than science • scientists need control over full pipeline
  4. WHAT ELSE? • Image domain analysis • Pulsar observations •

    Microwave background radiation • Cosmology • Follow up observations • Gravitational waves
  5. PROBLEMS IN RADIO ASTRONOMY • True big data • A

    lot of complex legacy code • Not a lot of money to hire experts • Focus on hardware, not software • A lot of duct tape programming
  6. ASTRONOMY <3 PYTHON • Most used programming language in astronomy

    • Together with IDL, C/C++, Fortran and Matlab • Rapid prototyping • Quick to learn • Numerical libraries • jupiter notebooks
  7. PROBLEMS WITH PYTHON • Slow adaptation, Python 2.7 • Speed

    • Package management so so for compiled modules • We don’t need 3 really, but it would be very very nice • CPython, PyCuda, tensor flow • anaconda? Debian packages
  8. TACKLING THE PROBLEMS IN RADIO ASTRONOMY • STEP 1 •

    Package up all existing software • http://kernsuite.info • Debian packages
  9. TACKLING THE PROBLEMS IN RADIO ASTRONOMY • STEP 2 •

    Making sure this software can run everywhere • Docker! • Run everywhere, even Mac and Windows
  10. WHY I LIKE DOCKER • Docker made containerisation awesome •

    Containerisation is awesome (but not new) • Useful for setting up services and hooking them together • Good support for windows and OS X • Image distribution, one command ‘installation’
  11. WHAT I DON’T LIKE ABOUT DOCKER • allowing to run

    containers == giving root • Dockerfile is just awful • caching of layers is more painful than useful • Mapping user ID’s? • Networking performance overhead • GPU acceleration? nvidia-docker, ieuw • all have workaround hacks • HPC is not the focus for Docker
  12. THE ALTERNATIVE • Singularity • Containers for HPC • Solves

    most of previously mentioned problems • Still young, less momentum • No root escalation, proper user ID mapping, no useless bells and whistles
  13. PROTOTYPE • Currently working on prototype pipeline at George Washington

    University • 2,924 CPU cores • 64 NVIDIA K20 GPUs • Runs older CentOS • singularity used to containerise KERN software suite • 40TB LOFAR dataset • job scheduling with SLURM
  14. TACKLING THE PROBLEMS IN RADIO ASTRONOMY • STEP 3 •

    chaining the containers as jobs in a pipeline • Luigi • Nextflow • Kubernetes? • Spark?
  15. TACKLING THE PROBLEMS IN RADIO ASTRONOMY • STEP 4 •

    Make a uniform interface for a container • parameterise containers • KLIKO • Scientific Compute Container Format • Python (also 3) • https://github.com/gijzelaerr/kliko
  16. THE FINAL STEP • STEP 5 • analysing the data

    • Offer easy to use tools for the astronomers • Openstack • Singularity • Jupyter • Whatever they want and need!
  17. ROUND UP • A lot of awesome things are going

    to happen • A lot of open questions • A lot of work to do • Python is currently the best high level language for radio astronomy