Save 37% off PRO during our Black Friday Sale! »

Zarr OGC 2020

Zarr OGC 2020

Zarr Community Standard proposal presentation given at the 2020 OGC Closing Plenary Session

654d48d6c1c10c50c160954ba31207a2?s=128

Ryan Abernathey

June 22, 2020
Tweet

Transcript

  1. OGC Closing Plenary | 2020-06-22
 Ryan Abernathey Zarr

  2. What is Zarr? An open-source format for the storage of


    chunked, compressed, N-dimensional arrays. Zarr has been proposed as an
 OGC Community Standard.
  3. What is Zarr? An open-source format for the storage of


    chunked, compressed, N-dimensional arrays. temperature pressure latitude longitude image credit: Xarray developers
  4. What is Zarr? An open-source format for the storage of


    chunked, compressed, N-dimensional arrays. https://twitter.com/trevmanz/status/1265377097981329423 https://twitter.com/LLC4320Bot/status/1274862822778982402 Oceanography Microscopy
  5. What is Zarr? An open-source format for the storage of


    chunked, compressed, N-dimensional arrays. zarr.readthedocs.io • Zarr began in 2016 as a storage library for the
 scientific python ecosystem. • Integrates closely with Numpy, Xarray, and Dask. zarr-developers/zarr-python
  6. What is Zarr? An open-source format for the storage of


    chunked, compressed, N-dimensional arrays. zarr-developers/zarr-python constantinpape/z5 bcdev/jzarr meggart/Zarr.jl gzuidhof/zarr.js/ Open spec (V2) has led to implementations in several language.
  7. What is Zarr? An open-source format for the storage of


    chunked, compressed, N-dimensional arrays. Zarr Group: .zgroup .zattrs .zarray .zattrs Zarr Array: foo 0.0 0.1 2.0 1.0 1.1 2.1 $ cat .zarray
 { "dtype": "<f8", "fill_value": "NaN",
 “chunks": [ 5, 720, 1440 ], "compressor": { "blocksize": 0, "clevel": 3, "cname": "zstd", "id": "blosc", "shuffle": 2 }, "filters": null, "order": "C", "shape": [ 8901, 720, 1440 ], "zarr_format": 2 } Metadata (json) Chunk data (binary)
  8. What is Zarr? An open-source format for the storage of


    chunked, compressed, N-dimensional arrays. Zarr Group: .zgroup .zattrs .zarray .zattrs Zarr Array: foo 0.0 0.1 2.0 1.0 1.1 2.1 Zarr can be stored in any key-value store. • Directory Store • ZipFile Store • Cloud object storage (e.g. S3, GCS) • Database (e.g. Redis, MongoDB) • Get creative—Zarr is designed to be hackable!
  9. CMIP6 Google Cloud Public Dataset Zarr in Action • Climate

    Model Intercomparison Project: latest climate model projections from modeling centers around the world • Pangeo project worked with Google Cloud to mirror data from ESGF • Zarr was chosen as format because of its interoperability with cloud object storage • 600 TB and growing https://cloud.google.com/blog/products/data-analytics/new-climate-model-data-now-google-public-datasets
  10. Zarr and OGC • We are seeking OGC member approval

    of Zarr V2 spec as an OGC Community Standard:
 
 “Community Standard serves to bring de facto standards from the larger geospatial community to be a stable reference point that can normatively referenced by governments and other organizations.” • Zarr is generic (comparable to HDF5) but can serve as a base format for other OGC standards, e.g. web coverage service. • Unidata will soon release a version of NetCDF with support for Zarr as an underlying storage container. • Zarr project received and EOSS Grant from the Chan Zuckerberg Institute to support development of V3 spec. Get involved @ zarr-developers/zarr-specs/