Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data reduction pipelines using Docker for the MeerKAT telescope

Data reduction pipelines using Docker for the MeerKAT telescope

Gijs Molenaar

July 21, 2016
Tweet

More Decks by Gijs Molenaar

Other Decks in Science

Transcript

  1. THE MEERKAT TELESCOPE DATA REDUCTION USING DOCKER FOR Software Circus

    Amsterdam July 17 2016 Gijs Molenaar http://pythonic.nl
  2. WHO IS GIJS • SCIENTIFIC SOFTWARE ENGINEER • BACKGROUND IN

    AI • AMSTERDAM • CAPE TOWN • RADIO TELESCOPES
  3. MEERKAT TELESCOPE • 64 dishes • Raw data 4096 Gb/s

    • correlation • flagging • imaging • calibration
  4. THE (IMAGING) SOFTWARE • Custom made software • developed by

    expert scientists • Cutting edge • Bleeding edge • Fragile • ‘Expensive’ to modify
  5. RADIO ASTRONOMY SOFTWARE SUITE KERN • Centralise the agony •

    Frustration for one person, the packager • A repository of commonly used radio astronomy software • Based on Ubuntu (16.04) • http://kernsuite.info
  6. KLIKO • specification and library • based on docker •

    See a container as a compute unit • Consuming and producing file based data based on parameters • Abstract interface • easy to klikonize existing container • http://kliko.readthedocs.org
  7. HOW TO DO KLIKO • Read input from /input •

    Write output /output • /kliko.yml inside container defines parameters • /kliko script is entry point
  8. KLIKO RUNNER Actor that runs the container Read kliko.yml file

    Generate parameters (user input) Run container with parameters Connect correct input and output
  9. $ kliko-run radioastro/klikotest --help usage: kliko-run [-h] [--target_folder TARGET_FOLDER] --choice

    {second,first} --char CHAR [--float FLOAT] --file FILE --int INT image_name positional arguments: image_name optional arguments: -h, --help show this help message and exit --target_folder TARGET_FOLDER --choice {second,first} choice field (default: second) --char CHAR char field, maximum of 10 chars (default: empty) --float FLOAT float field (default: 0.0) --file FILE file field, this file will be put in /input in case of split io, /work in case of join io --int INT int field
  10. FUNCTIONAL KLIKO CONTAINERS Chaining containers No side effects functional programming

    Caching results Implicit parallelisation airflow? luigi? something else? who?
  11. PROBLEMS WITH DOCKER Effectively giving root access GPU acceleration is

    crap Cached filesystem layers is just annoying Can’t combine containers A lot of new stuff I don’t care about (Windows)