Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MAST archive and operations in the MMA era

Arfon Smith
April 25, 2019

MAST archive and operations in the MMA era

In this presentation I discuss some of the ongoing work at MAST and STScI, and speculate on how these changes in the technology landscape of astronomy and astrophysics data management may be relevant to the multi-messenger astronomy community.

Arfon Smith

April 25, 2019

More Decks by Arfon Smith

Other Decks in Science


  1. MAST archive and operations in the MMA era Arfon Smith,

  2. • Raise MAST archive (and data management) to ‘Mission’ status

    • Responsible for DMS portfolio for all missions (HST, JWST, Kepler/K2, TESS, WFIRST, etc.) • Exploring new technologies, services and infrastructure for data management • Developing community expertise in combining data science and astronomy STScI: Data Science Mission Office https://archive.stsci.edu/reports/BigDataSDTReport_Final.pdf
  3. Three things happening at STScI/MAST that might be interesting… 1.

    2. 3. Science platforms as environments for archival data analysis and transient follow-up. Event-driven ‘serverless’ architectures for data processing. Community-contribution at all levels of the data management infrastructure.
  4. Archives & Services: Download data, compute locally CasJobs VO services

    JSON API http://archive.stsci.edu Data Calibration pipelines Raw data Software http://mast.stsci.edu TOPCAT Services
  5. Science Platforms Data Tools Compute APIs Web Portals Notebooks Internet

    A Science Platform is an environment which combines data storage, computational capabilities, software tools and interfaces for users to interact with the underlying components.

    NOAO STScI https://github.com/spacetelescope/science-platforms-workshop
  7. Some reasons Science Platforms are exciting • Being developed by

    many major projects & archives (MAST, IPAC, LSST, ESAC, CADC, NOAO…) - in a semi-coordinated fashion. • Provide access to large, high-value datasets and the ability to compute against them (server-side analytics). • Potentially provide access to substantial scalable compute resources including GPUs. • Leverages existing programmatic interfaces to astronomical archives (e.g. VO services and other APIs). • Convergence of technologies/conventions (notebook-driven analyses) for repeatable, reliable data exploration and analysis. • Potential environment for transient event analyses and broker development.
  8. Serverless/Function as a Service (FaaS) computing • Write a function

    (e.g. in Python, C++, Julia, Haskell, Fortran…) • Upload the function to a cloud computing platform (AWS Lambda, Google Cloud Functions, Azure Functions, Apache Whisk) • Define resources required for cloud function to execute (CPU/RAM) • Trigger function based on event rules (e.g. event posted to API or appearing in event stream) • SCIENCE!
  9. Hubble public data in the (AWS) cloud ~140TB public HST

    data from ACS, COS, STIS, WFC3, WFPC2
  10. Next-generation serverless pipeline processing …In this post we’re going to

    show you how to process 122,000 WFC3/IR images on AWS Lambda in about 2 minutes (and for about $2)
  11. Some reasons we’re excited about serverless computing • Allows engineers

    & astronomers to focus on ‘business logic’ of their analysis rather than thinking about infrastructure. • Can be triggered from multiple settings (e.g. automated background tasks or inline analysis steps) • Event-driven & responsive - can be very cost effective. • Potentially interesting for more data & compute intensive archive functionalities. • SCALE: Makes massively parallel computations easy*… * Easy to shoot yourself in the foot too
  12. Archive, Services, Software CasJobs VO services JSON API http://archive.stsci.edu Data

    Calibration pipelines Raw data Software http://mast.stsci.edu TOPCAT Services
  13. Status quo (until relatively recently) CasJobs VO services JSON API

    http://archive.stsci.edu http://mast.stsci.edu TOPCAT }Traditionally thought of as science centers activities
  14. Community contributions at all levels Data Software Services Community Alert

    brokers/agents Community-built Services ‘L3’ data products Community software (e.g. Astropy) Community software + L3/L4/L5 pipelines L3/L4/L5 data products (HLSPs) Reliance on community contributions at all levels
  15. Community contributions / co-creation of technology • Open source is

    now the ‘new normal’ in many sectors (especially data science) • What might the different roles be for projects/facilities, science teams, individuals? • Communities often form around shared challenges, shared data products • Easier to recognize innovations created by others when working with similar data
  16. Community software initiative • Core Infrastructure • Contributing to core,

    shared libraries (e.g. FITS, coordinate systems) • Community Outreach & Support • User Support • Documentation • Emerging efforts • LSST Photometry • JWST NIRSpec • MAST Astroquery