Pro Yearly is on sale from $80 to $50! »

Radio astronomy data reduction at PyData Amsterdam

978e79ad01185b39efcfca1482f0f819?s=47 Gijs Molenaar
October 12, 2016

Radio astronomy data reduction at PyData Amsterdam

978e79ad01185b39efcfca1482f0f819?s=128

Gijs Molenaar

October 12, 2016
Tweet

Transcript

  1. RADIO ASTRONOMY DATA REDUCTION IN GIJS MOLENAAR HTTP://PYTHONIC.NL - @GIJZELAERR

    PYDATA AMSTERDAM - 12 OCTOBER 2016 - ING AMSTERDAM
  2. ABOUT ME • Scientific Software Engineer • 50 / 50

    Amsterdam / Cape Town • PhD student South Africa • Focus on end user (astronomer) experience • Machine Learning • Brewer
  3. PYDATA • What to talk about? • How we get

    our data • How we use Python to crunch this data • Python projects i’m involved in
  4. None
  5. None
  6. RADIO ASTRONOMY

  7. THE PROBLEMS • We can’t directly observe the sky •

    Sampling in the fourier domain generates massive amounts of data • Our observation is incomplete • reconstruction is more art than science • scientists need control over full pipeline
  8. None
  9. WHAT ELSE? • Image domain analysis • Pulsar observations •

    Microwave background radiation • Cosmology • Follow up observations • Gravitational waves
  10. None
  11. None
  12. THE FUTURE SKA PHASE-2

  13. FAR SIDE OF THE MOON

  14. PROBLEMS IN RADIO ASTRONOMY • True big data • A

    lot of complex legacy code • Not a lot of money to hire experts • Focus on hardware, not software • A lot of duct tape programming
  15. ASTRONOMY <3 PYTHON • Most used programming language in astronomy

    • Together with IDL, C/C++, Fortran and Matlab • Rapid prototyping • Quick to learn • Numerical libraries • jupiter notebooks
  16. PROBLEMS WITH PYTHON • Slow adaptation, Python 2.7 • Speed

    • Package management so so for compiled modules • We don’t need 3 really, but it would be very very nice • CPython, PyCuda, tensor flow • anaconda? Debian packages
  17. OPEN SOURCE • Scientists open source most code • Production

    team is a bit more careful
  18. TACKLING THE PROBLEMS IN RADIO ASTRONOMY • STEP 1 •

    Package up all existing software • http://kernsuite.info • Debian packages
  19. TACKLING THE PROBLEMS IN RADIO ASTRONOMY • STEP 2 •

    Making sure this software can run everywhere • Docker! • Run everywhere, even Mac and Windows
  20. None
  21. (for HPC)

  22. WHY I LIKE DOCKER • Docker made containerisation awesome •

    Containerisation is awesome (but not new) • Useful for setting up services and hooking them together • Good support for windows and OS X • Image distribution, one command ‘installation’
  23. WHAT I DON’T LIKE ABOUT DOCKER • allowing to run

    containers == giving root • Dockerfile is just awful • caching of layers is more painful than useful • Mapping user ID’s? • Networking performance overhead • GPU acceleration? nvidia-docker, ieuw • all have workaround hacks • HPC is not the focus for Docker
  24. THE ALTERNATIVE • Singularity • Containers for HPC • Solves

    most of previously mentioned problems • Still young, less momentum • No root escalation, proper user ID mapping, no useless bells and whistles
  25. PROTOTYPE • Currently working on prototype pipeline at George Washington

    University • 2,924 CPU cores • 64 NVIDIA K20 GPUs • Runs older CentOS • singularity used to containerise KERN software suite • 40TB LOFAR dataset • job scheduling with SLURM
  26. TACKLING THE PROBLEMS IN RADIO ASTRONOMY • STEP 3 •

    chaining the containers as jobs in a pipeline • Luigi • Nextflow • Kubernetes? • Spark?
  27. TACKLING THE PROBLEMS IN RADIO ASTRONOMY • STEP 4 •

    Make a uniform interface for a container • parameterise containers • KLIKO • Scientific Compute Container Format • Python (also 3) • https://github.com/gijzelaerr/kliko
  28. None
  29. None
  30. BUT • Common Workflow language • Similar, more developed •

    We will probably go use this
  31. THE FINAL STEP • STEP 5 • analysing the data

    • Offer easy to use tools for the astronomers • Openstack • Singularity • Jupyter • Whatever they want and need!
  32. None
  33. ROUND UP • A lot of awesome things are going

    to happen • A lot of open questions • A lot of work to do • Python is currently the best high level language for radio astronomy
  34. HTTPS://GITHUB.COM/GRIFFINFOSTER/FUNDAMENTALS_OF_INTERFEROMETRY

  35. Question time No alien questions please http://pythonic.nl @gijzelaerr