Upgrade to Pro — share decks privately, control downloads, hide ads and more …

State of Pangeo - August 2019

State of Pangeo - August 2019

My presentation from the August 2019 Pangeo Community Meeting


Ryan Abernathey

August 21, 2019

More Decks by Ryan Abernathey

Other Decks in Science


  1. S tat e o f Pa n g e o

    A u g u s t 2 0 1 9 C o m m u n i t y M e e t i n g
  2. !2 “Pangeo is first and foremost a community promoting open,

    reproducible, and scalable science.
 “This community provides documentation, develops and maintains software, and deploys computing infrastructure to make scientific research and programming easier.” http://pangeo.io/
  3. Community Infrastructure Software Documentation Pa n g e o R

    e p o r t C a r D * !3 A A B C *Just one opinion Education? 2018-2019 School Year
  4. Pa n g e o C o m m u

    n i t y !4 We have a growing group of engaged participants in discussion. ✓GitHub Discussions ✓Weekly Checkin Meetings ✓Medium Blog ✓Working Groups ✓US / UK / Europe ✓Students / Postdocs / Faculty / Software Devs / Data Scientists ✓Academia / National Labs / Industry / NGO ✓Weather / Climate / Oceans / Geoscience / Neuroscience / Bioinformatics? / Astronomy?
  5. Pa n g e o C o m m u

    n i t y !5 We have a growing group of engaged participants in discussion. ✓GitHub Discussions ✓Weekly Checkin Meetings ✓Medium Blog ✓Working Groups ✓US / UK / Europe ✓Students / Postdocs / Faculty / Software Devs / Data Scientists ✓Academia / National Labs / Industry / NGO ✓Weather / Climate / Oceans / Geoscience / Neuroscience / Bioinformatics? / Astronomy?
  6. ✓Unidata NetCDF roadmap calls for Zarr backend ✓NCAR NCL end-of-life

    plan and the “pivot to python” cites Pangeo as a key technology for the future of data analysis ✓NASA DAACs publicly exploring Pangeo-style approaches to data distribution ✓CSIRO adoption ✓ECMWF adoption C o m m u n i t y M i l e s t o n e S !6
  7. ✓Unidata NetCDF roadmap calls for Zarr backend ✓NCAR NCL end-of-life

    plan and the “pivot to python” cites Pangeo as a key technology for the future of data analysis ✓NASA DAACs publicly exploring Pangeo-style approaches to data distribution ✓CSIRO adoption ✓ECMWF adoption C o m m u n i t y M i l e s t o n e S !7
  8. H y p e C y c l e !8

    Where are we in the Pangeo hype cycle? How can we manage such rapid growth? How can we harness community enthusiasm most effectively?
  9. Pa n g e o I n f r a

    s t r u c t u r e !9 Interactive, scalable, data-proximate computing environments for real scientific analysis NASA Pleiades HPC Cloud
  10. ✓Major progress on dask HPC compatibility (dask-jobqueue) ✓Jupyter portal on

    NCAR Cheyenne ✓Growing experimentation with Zarr format at HPC centers H P C M i l e s t o n e S !10
  11. ✓Pangeo binder has served 9,775 repos ✓Pangeo Cloud Federation: automated

    management of many complex JupyterHub deployments across multiple clouds
 https://github.com/pangeo-data/pangeo-cloud-federation/ ✓Pangeo Cloud Datastore: nested catalog of Zarr datasets in cloud storage, with intake and html interfaces
 https://pangeo-data.github.io/pangeo-datastore/ ✓NCAR Large Ensemble in AWS ✓CMIP6 Data Catalog is growing C l o u d M i l e s t o n e S !11
  12. o c e a n . pa n g e

    o . i o !12 Deployed since mid-May
  13. o c e a n . pa n g e

    o . i o !13 Deployed since mid-May
  14. S o f t w a r e !15 Intake

    Supporting community-driven open-source
  15. S o f t w a r e !17 What

    about domain-specific software tools? Geographically-aware indexing Regridding of regular and unstructured grids Domain-specific functions Vector calculus operations xgcm xesmf regionmask
 geoxarray? xrft
 climpred Machine Learning interfaces
  16. S o f t w a r e !18 My

    recommentation: Write a white paper on best practices for domain- specific package development within Pangeo ecosystem.
  17. D o c u m e n tat i o

    n !19 We have made huge strides in the past two years. How can we help the rest of our community catch up?
  18. D o c u m e n tat i o

    n !20 •About Pangeo ◦Motivation ◦Mission Statement ◦Goals ◦Get Involved •Frequently Asked Questions •Guide for Scientists ◦Learn About Pangeo Software ◦Explore the Use Cases ◦Try Out a Pangeo Deployment ◦Give Feedback ◦Contribute a Use Case ◦Contribute Data ◦Become an Open Source Contributor •Packages ◦Pangeo Core Packages ◦Pangeo Affiliated Packages ◦Guidelines for New Packages •Geoscience Use Cases ◦Physical Oceanography ◦Climate modeling ◦Meteorology •Technical Architecture ◦Where we began ◦Interoperability in Pangeo ◦Software ◦Compute Platforms ◦Storage Formats •Deployment Setup Guides ◦Setting up Pangeo on HPC Systems ◦Setting up Pangeo on Cloud Systems •Deployments ◦Pangeo provided Cloud Deployments ◦Other Cloud deployments ◦High Performance Computing Deployments •Pangeo and Data ◦Data on HPC ◦Data in the Cloud •Pangeo Data Catalog •Collaborators ◦Funding Agencies ◦Institutions ◦People •Pangeo Weekly Meeting Notes ◦Meeting Calendar ◦Meeting Notes •Pangeo Conferences ◦Future Meetings ◦Past Meetings •Contact Pangeo
  19. D o c u m e n tat i o

    n !21 •About Pangeo ◦Motivation ◦Mission Statement ◦Goals ◦Get Involved •Frequently Asked Questions •Guide for Scientists ◦Learn About Pangeo Software ◦Explore the Use Cases ◦Try Out a Pangeo Deployment ◦Give Feedback ◦Contribute a Use Case ◦Contribute Data ◦Become an Open Source Contributor •Packages ◦Pangeo Core Packages ◦Pangeo Affiliated Packages ◦Guidelines for New Packages •Geoscience Use Cases ◦Physical Oceanography ◦Climate modeling ◦Meteorology •Technical Architecture ◦Where we began ◦Interoperability in Pangeo ◦Software ◦Compute Platforms ◦Storage Formats •Deployment Setup Guides ◦Setting up Pangeo on HPC Systems ◦Setting up Pangeo on Cloud Systems •Deployments ◦Pangeo provided Cloud Deployments ◦Other Cloud deployments ◦High Performance Computing Deployments •Pangeo and Data ◦Data on HPC ◦Data in the Cloud •Pangeo Data Catalog •Collaborators ◦Funding Agencies ◦Institutions ◦People •Pangeo Weekly Meeting Notes ◦Meeting Calendar ◦Meeting Notes •Pangeo Conferences ◦Future Meetings ◦Past Meetings •Contact Pangeo
  20. D o c u m e n tat i o

    n R e fa c t o r !22 pangeo.io website Pangeo book Share general information about the project. Communicate with stakeholders and potential partners. Share technical information. Working title: Cloud Native Science I. User guide
 packages, data formats, dask tricks, etc. II. Administrator guide
 Hub deployment, dataset production, etc.
  21. • Education needs to be a central focus of our

    project going forward • Model after the NCL group: a traveling Pangeo roadshow of end-user workshops • Partner closely with Software Carpentry, university educators, etc. D o c u m e n tat i o n E d u c at i o n !23
  22. • Continue successful outreach and engagement with data providers •

    Creatively brainstorm how to sustain our cloud-based environments • Work together on pangeo-interoperable domain-specific software • Improve our documentation S u m m a r y !24 My goals for the next year: