The story of CWLifying Apercal

The story of CWLifying Apercal

The modular and portable APERTIF real-time imaging pipeline

978e79ad01185b39efcfca1482f0f819?s=128

Gijs Molenaar

March 21, 2019
Tweet

Transcript

  1. The story of CWLifying apercal The modular and portable APERTIF

    real-time imaging pipeline
  2. pythonic.nl • Since 2 years Contractor • Before at UvA

    and SKA South Africa (6 years) • In radio astronomy and others Open Source Software Project Management Dev-Ops Data Science
  3. What have I been working on?

  4. The original proposal • 8 Jan 2018 • Michael Crusoe

    • Apply fo ASTERICS/OBELICS mone to work on radio astronomy pipelines • Containerize and CWLify
  5. Great plan! • Just finished a EOSC pilot with ASTRON

    • Everybody was enthusiastic • Finally show usefulness of packaging, CWL and containerisation Prefactor pipeline
  6. EOSC4LOFAR pilot? • Take 3 astronomy pipelines • Prefactor, Spiel,

    Presto • Containerize and CWLize • Run on various platforms • Report about results
  7. Submitted • In march 2018 applied • Made planning to

    start September 2018 • Sort of approvedish in May. • No signed contract yet. • Work on KERN, apercal and CWL tooling
  8. Why packaging The problem: • Installing scientific software • Compile

    flags • Dependencies • Patches • Environment variables • Consistency & reproducibility • Centralize and minimise agony
  9. KERN • Debian packages (Ubuntu LTS) • Released every 6

    months • KERN-5, released January 15 2018 • 115 packages and growing
  10. Portability • Docker • Singularity • And if nothing works:

    uDocker (user space docker)
  11. Example Dockerfile FROM kernsuite/base:3 RUN docker-apt-install prefactor

  12. A standard for building pipelines

  13. CWL • HTML for pipelines • Define relationships between steps/

    programs • On task level • Not a parallelisation framework like Dask or OpenMP / DP3 Presto
  14. Task • Any piece of software that finishes in a

    ‘reasonable amount of time’ • Takes input • Has arguments • Produces output • We assume deterministic
  15. Why useful • A formal description of the tools •

    A formal description on the how to combine them • Split of responsibilities
  16. Workflow • Combine tasks into workflow • A DAG (Directed

    A-cyclic Graph) • Connect input to output • Manage parameters • Indicate what can run in parallel (implicit)
  17. Ideal scientific software workflow Prototyping & Exploratory coding (jupyter notebook)

    Extract reusable Components Make pretty library (project management) Package, Containerise and CWLize
  18. Meanwhile • September 2018 • Still no signed contract! •

    So we delay one month, was busy anyway
  19. October 2018 • I had to get started, otherwise too

    little income • 2.5 days a week • Other days work on PhD and SETI (Berkeley) • Meanwhile contact person at CNRS delays process • useless discussions about intellectual property concerns (open source)
  20. Half of November • Still no contract • Stalled all

    activities •Communication with CNRS is extremely difficult
  21. Would be fine • If SETI payed. Different legal story.

  22. So, I was done • No more radio astronomy for

    me. • Too much convincing • Too frustrating • Too difficult • No progress
  23. But that is not the end • ASTRON took over

    contract from CNRS! • So we could finish our work • Still love radio astronomy • I like to finish what I started • 1.5 day a week
  24. continue!

  25. CWL apercal

  26. What works • Containerized • Cross platform • One parameter

    file • (Probably) easy to reconfigure (one -> multiple polcal/fluxcal) • Easy to replace steps
  27. Why only first steps? • There is no finished apercal

    imaging pipeline! • So can’t CWLify • Code was/is hard / impossible to parallelise without complete rewrite
  28. Issues • Code is one giant conditional spaghetti • Mixed

    responsibilities, including parallelisation • Mixed production and development • Not testable • Mix of old and new frameworks (miriad, casa)
  29. None
  30. None
  31. What else • Casacore 3.0 has been released! • KERN-5

    has been released! • A basic CWL web interface has been made (Buis)
  32. Buis • Git based CWL workflow scheduler, progress monitor and

    result viewer • Containerised (Docker compose / Kubernetes) • Supports all features Toil supports
  33. Demo https://github.com/gijzelaerr/buis

  34. To Do • Progress visualisation • Result visualisation (RODRIGUES)

  35. What could be better • Better software project management •

    Not enough software engineers that solve fundamental problems • Release management • Split of development and deployment • More modular solutions • Enables a transition to new tools Python3, git, Kubernetes
  36. Ideal scientific software workflow Prototyping & Exploratory coding (jupyter notebook)

    Extract reusable Components Make pretty library (project management) Package, Containerise and CWLize
  37. Lessons learned (me) • Dont start working without a contract

    • Rechtsbijstandverzekering • Change is good • NEVER EVER WORK AGAIN WITH CNRS
  38. Questions?