Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The story of CWLifying Apercal

The story of CWLifying Apercal

The modular and portable APERTIF real-time imaging pipeline

Gijs Molenaar

March 21, 2019
Tweet

More Decks by Gijs Molenaar

Other Decks in Science

Transcript

  1. pythonic.nl • Since 2 years Contractor • Before at UvA

    and SKA South Africa (6 years) • In radio astronomy and others Open Source Software Project Management Dev-Ops Data Science
  2. The original proposal • 8 Jan 2018 • Michael Crusoe

    • Apply fo ASTERICS/OBELICS mone to work on radio astronomy pipelines • Containerize and CWLify
  3. Great plan! • Just finished a EOSC pilot with ASTRON

    • Everybody was enthusiastic • Finally show usefulness of packaging, CWL and containerisation Prefactor pipeline
  4. EOSC4LOFAR pilot? • Take 3 astronomy pipelines • Prefactor, Spiel,

    Presto • Containerize and CWLize • Run on various platforms • Report about results
  5. Submitted • In march 2018 applied • Made planning to

    start September 2018 • Sort of approvedish in May. • No signed contract yet. • Work on KERN, apercal and CWL tooling
  6. Why packaging The problem: • Installing scientific software • Compile

    flags • Dependencies • Patches • Environment variables • Consistency & reproducibility • Centralize and minimise agony
  7. KERN • Debian packages (Ubuntu LTS) • Released every 6

    months • KERN-5, released January 15 2018 • 115 packages and growing
  8. CWL • HTML for pipelines • Define relationships between steps/

    programs • On task level • Not a parallelisation framework like Dask or OpenMP / DP3 Presto
  9. Task • Any piece of software that finishes in a

    ‘reasonable amount of time’ • Takes input • Has arguments • Produces output • We assume deterministic
  10. Why useful • A formal description of the tools •

    A formal description on the how to combine them • Split of responsibilities
  11. Workflow • Combine tasks into workflow • A DAG (Directed

    A-cyclic Graph) • Connect input to output • Manage parameters • Indicate what can run in parallel (implicit)
  12. Ideal scientific software workflow Prototyping & Exploratory coding (jupyter notebook)

    Extract reusable Components Make pretty library (project management) Package, Containerise and CWLize
  13. October 2018 • I had to get started, otherwise too

    little income • 2.5 days a week • Other days work on PhD and SETI (Berkeley) • Meanwhile contact person at CNRS delays process • useless discussions about intellectual property concerns (open source)
  14. Half of November • Still no contract • Stalled all

    activities •Communication with CNRS is extremely difficult
  15. So, I was done • No more radio astronomy for

    me. • Too much convincing • Too frustrating • Too difficult • No progress
  16. But that is not the end • ASTRON took over

    contract from CNRS! • So we could finish our work • Still love radio astronomy • I like to finish what I started • 1.5 day a week
  17. What works • Containerized • Cross platform • One parameter

    file • (Probably) easy to reconfigure (one -> multiple polcal/fluxcal) • Easy to replace steps
  18. Why only first steps? • There is no finished apercal

    imaging pipeline! • So can’t CWLify • Code was/is hard / impossible to parallelise without complete rewrite
  19. Issues • Code is one giant conditional spaghetti • Mixed

    responsibilities, including parallelisation • Mixed production and development • Not testable • Mix of old and new frameworks (miriad, casa)
  20. What else • Casacore 3.0 has been released! • KERN-5

    has been released! • A basic CWL web interface has been made (Buis)
  21. Buis • Git based CWL workflow scheduler, progress monitor and

    result viewer • Containerised (Docker compose / Kubernetes) • Supports all features Toil supports
  22. What could be better • Better software project management •

    Not enough software engineers that solve fundamental problems • Release management • Split of development and deployment • More modular solutions • Enables a transition to new tools Python3, git, Kubernetes
  23. Ideal scientific software workflow Prototyping & Exploratory coding (jupyter notebook)

    Extract reusable Components Make pretty library (project management) Package, Containerise and CWLize
  24. Lessons learned (me) • Dont start working without a contract

    • Rechtsbijstandverzekering • Change is good • NEVER EVER WORK AGAIN WITH CNRS