Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Easy to deploy and easy to modify data reduction pipelines using KERN and CWL

Gijs Molenaar
October 09, 2019

Easy to deploy and easy to modify data reduction pipelines using KERN and CWL

Radio telescopes are producing more data than ever before. The data rates of future telescopes, like SKA, will be of such a great scale making it infeasible to keep the end-user astronomer involved in the imaging and calibration step.

However, by making the pipelines open access, modular, easy to deploy and easy to modify, we believe that involving the end-user astronomer in the calibration decision process might not be that far-fetched. Creating technical and logistical pipeline infrastructure is pivotal, and will eventually result in better data products and better science.

Our solution consists of multiple layers. The base layer is the packaging of all the relevant software, and regularly releases a bundle of these packages as software distributions. This created a solid basis for reproducible science. Secondly, to address the portability and deployment issue, we adopt container technology with singularity currently as the primary choice. Lastly, to recombine these packages into data reduction pipelines, we embrace the Common Workflow Language (CWL). CWL is an open and free standard software package which is implemented by numerous pipeline running frameworks.

Decomposing the data reduction problem in this matter enables recombining and modifying pipelines in a high-level abstract way, while also enabling implicit parallelisation due to the functional (programming) nature of the standard. Furthermore, it represents all available software as with uniform interfaces resulting in simplifying storage, representation and visualisation of both parameters and (intermediate) results.

At ASTRON and SKA South Africa, successful experiments have been completed based on the described ensemble during recent years. These are slowly being transferred into production. Still, there is much more work to be done. The software stack needs improvements to make it future-ready, and the packaging requires continuous effort in need of financing.

This talk is intended towards astronomers and sysadmins who struggle to maintain- or intend to set up a multi-user multi-machine cluster for medium to large scale data reduction.

Gijs Molenaar

October 09, 2019
Tweet

More Decks by Gijs Molenaar

Other Decks in Science

Transcript

  1. Easy to deploy and easy to modify data reduction pipelines

    using KERN and CWL packaging, containerization and pipelines ADASS 2019 - Groningen Gijs Molenaar
  2. Who am I • pythonic.nl • 8 years • Research

    / Software Engineer • Machine learning • Contractor in science and industry • Part-time remote PhD student South Africa
  3. What did I work on? • Large scale distributed pipeline

    deployment • Packaging / improving radio astronomy software • Transient Pipeline (TraP)
  4. The problem • SKA is coming • Data volumes to

    high to transport data • Processing on the spot • Opinionated data processing
  5. But is it a problem? • Why not let scientist

    process data on the spot • Give them control over the data reduction • Let them deliver a pipeline
  6. Scientific Software • Often hard to install • Locally and

    on cluster • Issues like • Broken software • Compilation • Dependencies (versions) • Python 2 / 3
  7. We want to • Centralize agony • Only compile once

    • Compatible tools • Make sure deployment are uniform between platforms
  8. Containers • Often leads to massive containers • What is

    inside? • How to combine containers?
  9. Package management • A boring old solution • But does

    the job well • Can be installed inside and outside container
  10. KERN • Made for SKA South Africa • Radio astronomy

    software packages • basics, Imaging, pulsar • A superset of Ubuntu LTS • 75 packages and growing • New release every +/- 6 months
  11. More than packaging • A platform for radio astron software

    improvement • A community (reporting and fixing bugs) • Upstream bug fixes • Python-casacore binary wheel • Presto python3
  12. Debian • Collaboration with Debian Astro blend • Many KERN

    packages are now in Debian • Ole Streicher is also here
  13. No KERN-6? • No funding! • If you use KERN

    and find it useful • Please consider helping me find funding :)
  14. CWL • Common Workflow Language • The HTML of pipelines

    • Method for describing your pipeline • And how all components interact
  15. Want to know more? • Blog post with full story

    • http://bit.do/radiopipelines
  16. In short • I Believe astronomers can reduce their own

    data • Temporarily hand over control of the data processing hardware • Composable pipelines using packaging, containerization & CWL • Remaining challenge: Organisational setup
  17. Organisational setup • Open source default pipeline • Have the

    option to bundle a pipeline with your observation proposal