geoprocessing Rich Signell Research Oceanographer U.S. Geological Survey Ryan Abernathy (Columbia) Joe Hamman (NCAR) Matthew Rocklin (Anaconda->NVIDIA) Jacob Tomlinson (UK Met Office->NVIDIA) Scott Henderson (UW) and the rest of the Pangeo Community! UNH CCOM/OE Seminar 2020-10-16
with NetCDF/HDF on cloud storage • Simple format, clear specification • Each chunk is stored as a separate binary object • Lightweight global and variable metadata stored as JSON • Groups, filters, compression using Blosc • Free, open-source software • Read/write in Python using Xarray
is 15TB NWM is part of the Big Data Project, with data being pushed to the Cloud: Forecast data: s3:noaa-nwm-pds 25 year reanalysis: s3:nwm-archive $25K research credits from Amazon to explore using Pangeo for National Water Model data
Container Service for Kubernetes (Amazon EKS) • Three classes of k8s node pools • Core pool: JupyterHub, web proxy (small) • Jupyter pool: autoscaling pool for single-user sessions • Dask pool: autoscaling pool for Dask workers on premptible (e.g., spot) instances • Pangeo installed with Helm chart • Custom Docker environments built with repo2docker at https://github.com/pangeo-data/pangeo-docker-images • Deployed using https://github.com/pangeo-data/pangeo-cloud- federation • Full deploy instructions at pangeo.io
computing models, research credits, waving egress charges for research • New skills required: AWS workshops, hackathons, institutional road shows • Data formats and data standardization: benchmarking, blogging
try out the demos in the gallery • Read the awesome articles at medium.com/pangeo • Chat with the team on gitter.im/pangeo-data • Install the Pangeo environment on your local computer or HPC • Run a Pangeo JupyterHub on AWS • Rechunk your data with rechunker