MAST archive and operations in the MMA era

MAST archive and operations in the MMA era

In this presentation I discuss some of the ongoing work at MAST and STScI, and speculate on how these changes in the technology landscape of astronomy and astrophysics data management may be relevant to the multi-messenger astronomy community.

03e2e7de45b193cac192ae7ea071e5ff?s=128

Arfon Smith

April 25, 2019
Tweet

Transcript

  1. MAST archive and operations in the MMA era Arfon Smith,

    Data Science Mission Office EXPANDING THE FRONTIERS OF SPACE ASTRONOMY
  2. • Raise MAST archive (and data management) to ‘Mission’ status

    • Responsible for DMS portfolio for all missions (HST, JWST, Kepler/K2, TESS, WFIRST, etc.) • Exploring new technologies, services and infrastructure for data management • Developing community expertise in combining data science and astronomy STScI: Data Science Mission Office https://archive.stsci.edu/reports/BigDataSDTReport_Final.pdf
  3. Three things happening at STScI/MAST that might be interesting… 1.

    2. 3. Science platforms as environments for archival data analysis and transient follow-up. Event-driven ‘serverless’ architectures for data processing. Community-contribution at all levels of the data management infrastructure.
  4. Science platforms as environments for archival data analysis and transient

    follow-up. 1.
  5. MAST: Multi-mission archive

  6. MAST: Multi-mission archive

  7. Archives & Services: Download data, compute locally CasJobs VO services

    JSON API http://archive.stsci.edu Data Calibration pipelines Raw data Software http://mast.stsci.edu TOPCAT Services
  8. MAST: Archival publication rates

  9. Science Platforms Data Tools Compute APIs Web Portals Notebooks Internet

    A Science Platform is an environment which combines data storage, computational capabilities, software tools and interfaces for users to interact with the underlying components.
  10. Key part of LSST data management system

  11. Science Platforms aka ‘server-side analytics’ Notebook-like interface integrated with astronomical

    data services - Gregory Dubois Felsmann (LSST DM)
  12. 䢀 Composable machine images: FROM lsstsqre/pipeline BYOS: Shareable, compassable computational

    environments
  13. CADC DES ESAC IPAC JHU LSST NASA NCSA NDS NED

    NOAO STScI https://github.com/spacetelescope/science-platforms-workshop
  14. Cloud-hosted data analysis environment

  15. Some reasons Science Platforms are exciting • Being developed by

    many major projects & archives (MAST, IPAC, LSST, ESAC, CADC, NOAO…) - in a semi-coordinated fashion. • Provide access to large, high-value datasets and the ability to compute against them (server-side analytics). • Potentially provide access to substantial scalable compute resources including GPUs. • Leverages existing programmatic interfaces to astronomical archives (e.g. VO services and other APIs). • Convergence of technologies/conventions (notebook-driven analyses) for repeatable, reliable data exploration and analysis. • Potential environment for transient event analyses and broker development.
  16. Notebook-driven analysis: Not just academia

  17. https://speakerdeck.com/jakevdp/the-unexpected-effectiveness-of-python-in-science Broad, rich ecosystem

  18. Event-driven ‘serverless’ architectures for data processing. 2.

  19. Serverless/Function as a Service (FaaS) computing • Write a function

    (e.g. in Python, C++, Julia, Haskell, Fortran…) • Upload the function to a cloud computing platform (AWS Lambda, Google Cloud Functions, Azure Functions, Apache Whisk) • Define resources required for cloud function to execute (CPU/RAM) • Trigger function based on event rules (e.g. event posted to API or appearing in event stream) • SCIENCE!
  20. Hubble public data in the (AWS) cloud ~140TB public HST

    data from ACS, COS, STIS, WFC3, WFPC2
  21. Robust, programmatic access to cloud-hosted data Hubble public data in

    the (AWS) cloud
  22. MAST Labs exploratory technical blog: mast-labs.stsci.io Hubble public data in

    the (AWS) cloud
  23. Next-generation serverless pipeline processing …In this post we’re going to

    show you how to process 122,000 WFC3/IR images on AWS Lambda in about 2 minutes (and for about $2)
  24. NISAR: 85TB/day

  25. Some reasons we’re excited about serverless computing • Allows engineers

    & astronomers to focus on ‘business logic’ of their analysis rather than thinking about infrastructure. • Can be triggered from multiple settings (e.g. automated background tasks or inline analysis steps) • Event-driven & responsive - can be very cost effective. • Potentially interesting for more data & compute intensive archive functionalities. • SCALE: Makes massively parallel computations easy*… * Easy to shoot yourself in the foot too
  26. Community-contribution at all levels of the data management infrastructure. 3.

  27. Archive, Services, Software CasJobs VO services JSON API http://archive.stsci.edu Data

    Calibration pipelines Raw data Software http://mast.stsci.edu TOPCAT Services
  28. Status quo (until relatively recently) CasJobs VO services JSON API

    http://archive.stsci.edu http://mast.stsci.edu TOPCAT }Traditionally thought of as science centers activities
  29. Community contributions at all levels Data Software Services Community Alert

    brokers/agents Community-built Services ‘L3’ data products Community software (e.g. Astropy) Community software + L3/L4/L5 pipelines L3/L4/L5 data products (HLSPs) Reliance on community contributions at all levels
  30. Community software

  31. Change in the way technology is created

  32. Community contributions / co-creation of technology • Open source is

    now the ‘new normal’ in many sectors (especially data science) • What might the different roles be for projects/facilities, science teams, individuals? • Communities often form around shared challenges, shared data products • Easier to recognize innovations created by others when working with similar data
  33. Community software initiative • Core Infrastructure • Contributing to core,

    shared libraries (e.g. FITS, coordinate systems) • Community Outreach & Support • User Support • Documentation • Emerging efforts • LSST Photometry • JWST NIRSpec • MAST Astroquery
  34. Thanks! arfon@stsci.edu https://mast-labs.stsci.io