Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zarr OGC 2020

Zarr OGC 2020

Zarr Community Standard proposal presentation given at the 2020 OGC Closing Plenary Session

Ryan Abernathey

June 22, 2020
Tweet

More Decks by Ryan Abernathey

Other Decks in Technology

Transcript

  1. What is Zarr? An open-source format for the storage of


    chunked, compressed, N-dimensional arrays. Zarr has been proposed as an
 OGC Community Standard.
  2. What is Zarr? An open-source format for the storage of


    chunked, compressed, N-dimensional arrays. temperature pressure latitude longitude image credit: Xarray developers
  3. What is Zarr? An open-source format for the storage of


    chunked, compressed, N-dimensional arrays. https://twitter.com/trevmanz/status/1265377097981329423 https://twitter.com/LLC4320Bot/status/1274862822778982402 Oceanography Microscopy
  4. What is Zarr? An open-source format for the storage of


    chunked, compressed, N-dimensional arrays. zarr.readthedocs.io • Zarr began in 2016 as a storage library for the
 scientific python ecosystem. • Integrates closely with Numpy, Xarray, and Dask. zarr-developers/zarr-python
  5. What is Zarr? An open-source format for the storage of


    chunked, compressed, N-dimensional arrays. zarr-developers/zarr-python constantinpape/z5 bcdev/jzarr meggart/Zarr.jl gzuidhof/zarr.js/ Open spec (V2) has led to implementations in several language.
  6. What is Zarr? An open-source format for the storage of


    chunked, compressed, N-dimensional arrays. Zarr Group: .zgroup .zattrs .zarray .zattrs Zarr Array: foo 0.0 0.1 2.0 1.0 1.1 2.1 $ cat .zarray
 { "dtype": "<f8", "fill_value": "NaN",
 “chunks": [ 5, 720, 1440 ], "compressor": { "blocksize": 0, "clevel": 3, "cname": "zstd", "id": "blosc", "shuffle": 2 }, "filters": null, "order": "C", "shape": [ 8901, 720, 1440 ], "zarr_format": 2 } Metadata (json) Chunk data (binary)
  7. What is Zarr? An open-source format for the storage of


    chunked, compressed, N-dimensional arrays. Zarr Group: .zgroup .zattrs .zarray .zattrs Zarr Array: foo 0.0 0.1 2.0 1.0 1.1 2.1 Zarr can be stored in any key-value store. • Directory Store • ZipFile Store • Cloud object storage (e.g. S3, GCS) • Database (e.g. Redis, MongoDB) • Get creative—Zarr is designed to be hackable!
  8. CMIP6 Google Cloud Public Dataset Zarr in Action • Climate

    Model Intercomparison Project: latest climate model projections from modeling centers around the world • Pangeo project worked with Google Cloud to mirror data from ESGF • Zarr was chosen as format because of its interoperability with cloud object storage • 600 TB and growing https://cloud.google.com/blog/products/data-analytics/new-climate-model-data-now-google-public-datasets
  9. Zarr and OGC • We are seeking OGC member approval

    of Zarr V2 spec as an OGC Community Standard:
 
 “Community Standard serves to bring de facto standards from the larger geospatial community to be a stable reference point that can normatively referenced by governments and other organizations.” • Zarr is generic (comparable to HDF5) but can serve as a base format for other OGC standards, e.g. web coverage service. • Unidata will soon release a version of NetCDF with support for Zarr as an underlying storage container. • Zarr project received and EOSS Grant from the Chan Zuckerberg Institute to support development of V3 spec. Get involved @ zarr-developers/zarr-specs/