Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Jupyter Notebooks for Radio Astronomy Research - by Brad Frank

Pycon ZA
October 05, 2017

Jupyter Notebooks for Radio Astronomy Research - by Brad Frank

At the Inter-University Institute for Data Intensive Astronomy (IDIA, www.idia.ac.za), we are focusing on several important use-cases related to the delivery of science data products from large radio telescopes, such as MeerKAT. The requirements for the hardcore processing and analysis of raw radio data has to be counter-balanced with our essential need to collaborate on our science projects.

We have thus adopted the Jupyter Hub/Notebooks as the principle means of running radio astronomy workflows, pipelines and calibration scripts. We have found this to be an enormously useful and powerful medium to prototype technical recipes, and to share lessons learned. This allows us to shorten the amount of time taken to develop a complex astronomical workflow, and shifts the focus on the data and the processes involved in a single, comprehensive framework.

The usage of Jupyter Notebooks is become quite popular in radio astronomy, and marks a distinct paradigm shift in the way that astronomers all over the world are collaborating on large science projects.

In my talk I will focus on our usage of Jupyter Hub/Notebooks within an astronomical context, and some highlights related to the development of our astronomical computing software stack in python.

Pycon ZA

October 05, 2017
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. Image courtesy of www.ska.ac.za Jupyter Notebooks for Radio Astronomy* Dr.

    Bradley Frank† Timothy Carr, Stefan Coetzee, David Aikema, Rob Simmonds, Russ Taylor †ARC Project Scientist — IDIA Senior Researcher — SKA Lecturer, UCT Astronomy *on the cloud
  2. • Context — who are we, and what are we

    doing? • Computing & Astronomy. • The Future.
  3. 1km 1km What collecting area would you need to detect

    galaxies at the very beginning of the universe? The Square Kilometer Array.
  4. MeerKAT • South African SKA Pathfinder. • 64x13.5m dishes. •

    Distributed over 8km in the Karoo. • One of the biggest and most sensitive radio telescopes in the world.
  5. MeerKAT • N dishes, Nchan channels, Npol polarisations. • MeerKAT

    produces 65,000,000 samples every 0.5s • Full 8-hr track will produce 50TB data. • Typical project ~ 1000-hrs, ~50PB raw-data. N (N 1) 2 Nchan Npol
  6. Waterstofgas (blauw) in het sterrenstelsel NGC 6946, waargenomen met de

    Westerbork Synthese Radio Telescoop. Image Domain Fourier Domain “Synthesized” Aperture Image
  7. An International Project The world coming to Africa 45 22

    Countries Justin Jonas / Bernie Fanaroff / SKA-SA
  8. Data Management & Averaging Polarization & Cross Calibration Flagging SelfCal

    DDE Calibration / Peeling Continuum Subtraction Data Gateway Full Resolution Visdata, Caltables & Flagtables (HDF5) Averaged Visdata & Caltables (MS) Fast Cube Store IMAGE Long Term Visibility Store Fast Visibility Store Joint Deconvolution Catalogues Cubes Images Spectral Line Imager Cube Addition/ Mosaicking Cube Source Finding Moment Analysis Deep/ Wide Cube Image Continuum Subtraction Calibrated Spectral Line Visdata IDIA Tier-2 Facility CALIBRATE Return Archival or Legacy Data Product Quality Assurance and Visualization Karoo Facility Correlator Solver Flagging QA Imager Tier 1 Facility Cape Town MeerKAT Archive CHPC Visual Analytics Visualization Analysis ASTRON/IBM- Dome Data Centre
  9. MeerKAT Data Processing • Data centre needs to be powerful.

    • Keep up with the data. • Management/transportation is a huge challenge. • Framework needs to be flexible. • Easily integrate recipes, and share expertise. • One of the huge spin-off industries to arise out of the SKA.
  10. The ARC and IDIA • ARC: African Research Cloud. •

    Progenitor concept driven by UCT/NWU. • IDIA: Inter-University Institute for Data Intensive Astronomy. • Comprehensive Support for MeerKAT Science Cases. • Data Processing, Visualization, Multi-wavelength synergies, Simulation. • UCT, NWU, UP, UWC and SPU.
  11. IDIA • First roll-out of data centre is up: •

    40 compute nodes. • 2.6GHz Xeon processors. • 32 cores. • 256GB RAM. • 4 nodes with 2x NVidia P100 GPUs. • Combination of POSIX, Block and Object Store. • 0.5PB Initial storage (to be expanded in 2018) • 10Gb/s network access to MeerKAT Archive. • Located at UCT. • OpenStack, with Singularity containers and Jupyter Hub.
  12. Karoo Facility Correlator Solver Flagging QA Imager Tier 1 Facility

    Cape Town MeerKAT Archive CHPC Tier 2 Facility Cape Town IDIA Visibility Data Tables Visibility Data Tables Catalogues/Images Legacy Data Products Regional Science Data Centre External Data Centre
  13. ARCADE • African Research Cloud Astronomy Demonstrator ProjEct. • Aimed

    to develop Proof of Concepts for Astronomy use-cases. • Data Processing — Calibration, Imaging. • Training and Collaboration.
  14. ARCADE • African Research Cloud Astronomy Demonstrator ProjEct. • Aimed

    to develop Proof of Concepts for Astronomy use-cases. • Data Processing — Calibration, Imaging. • Training and Collaboration. Crunching Making pretty pictures. Getting jerks to document their recipes.
  15. ARCADE Calibrate! Google http://sciportal.arc.ac.za Web Page Title Calibrate Disk Disk

    Disk CALIBRATE IMAGE VISUALIZE Disk TRANSFER QA Compute Compute Compute Compute Compute Compute Compute Object Store Posix Software CASAPY, MeqTrees, SKA-SDP Storage
  16. ARCADE • Jupyter Notebooks. • Open-source platform. • IPython/Multi-kernel. •

    Access to VMs via Jupyter-Hub (and SSH). • Ideal combination of documentation and code.
  17. Astronomical Techniques • 45 students, with varying levels of programming

    experience. • Most astronomical packages — python or python- wrapped. • Learning outcomes: • Basic Statistics. • Python Programming. • Radio Astronomy Image Analysis.
  18. The Challenges • Instructions for all the students. • Mitigating

    against the need for installing complex software. • Simple access to data. • Each student to submit an original report. • Technology accessible on campus; available on any device, e.g., laptop/desktop/tablet/smartphone.
  19. Two Options Logistics Technology UCT SCILAB >60 participants 1x3h session

    Windows Deep Freeze NASSP Lab 24 participants 2x90mins Ubuntu
  20. Two Options Logistics Technology Result UCT SCILAB >60 participants 1x3h

    session Windows Deep Freeze Unsuitable • Data • Software • Windows NASSP Lab 24 participants 2x90mins Ubuntu Slow Crashed XP
  21. Two Options Logistics Technology Result UCT SCILAB >60 participants 1x3h

    session Windows Deep Freeze Data Software Windows NASSP Lab 24 participants 2x90mins Ubuntu Slow Crashed XP
  22. ARCADE • Set up AST2003H VM Instance. • Enough RAM/CPU/Storage

    for our requirements. • Exposed to UCT IPs (accessible via SCILABs, Eduroam and VPN). • Docker container: • Jupyter Hub. • User Management. • Numpy, matplotlib, astropy. • Data: NVSS, SDSS and WSRT Images. • 45 students logged onto VM through their browser.
  23. ARCADE • Set up AST2003H VM Instance. • Enough RAM/CPU/Storage

    for our requirements. • Exposed to UCT IPs (accessible via SCILABs, Eduroam and VPN). • Docker container: • Jupyter Hub. • User Management. • Numpy, matplotlib, astropy. • Data: NVSS, SDSS and WSRT Images. • 45 students logged onto VM through their browser.
  24. ARCADE • Numerous benefits. • Comprehensive access and instruction for

    all students. • Persistent Python kernels for each student. • Can work at their leisure. • Submit reports with embedded code, plots and markdown text!
  25. • Processing of SETI data: Citizen Science. • SETI@IBMCloud: •

    IBM Object Storage and Spark. • IPython Notebooks • https://github.com/ibm-cds- labs/seti_at_ibm • Use spectrograms to look for signals.
  26. The Cloud + Jupyter • Unlimited potential to teach coding,

    mathematics and science. • Accessible anywhere (just need a smart device + internet connection). • Low-overhead for the long-tail researcher. • Can integrate a learning intervention quickly and seamlessly. • Not the best or the slickest interface, but certainly the most generic.
  27. Jupyter Notebooks • Quickly becoming the lingua frança of scientific

    knowledge sharing. • Recipe + data = Value Added Product, and canonical record of processes involved. • Easy to integrate new, bespoke kernels. • Python/SQL/Julia etc… • Container based!
  28. Cloud + Jupyter for Long Tail Researchers • No need

    to spend R100ks on closet servers. • … which would include sys-admin, power, UPS… • Just a few sys-admins, data centre and vanilla VM images to empower the researcher. • Jupyter — no need to SSH into the machine — you have a Notebook and a Terminal handy!