Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zarr OGC 2020

Zarr OGC 2020

Zarr Community Standard proposal presentation given at the 2020 OGC Closing Plenary Session

Ryan Abernathey

June 22, 2020
Tweet

More Decks by Ryan Abernathey

Other Decks in Technology

Transcript

  1. OGC Closing Plenary | 2020-06-22

    Ryan Abernathey
    Zarr

    View Slide

  2. What is Zarr?
    An open-source format for the storage of

    chunked, compressed, N-dimensional arrays.
    Zarr has been proposed as an

    OGC Community Standard.

    View Slide

  3. What is Zarr?
    An open-source format for the storage of

    chunked, compressed, N-dimensional arrays.
    temperature pressure latitude longitude
    image credit: Xarray developers

    View Slide

  4. What is Zarr?
    An open-source format for the storage of

    chunked, compressed, N-dimensional arrays.
    https://twitter.com/trevmanz/status/1265377097981329423
    https://twitter.com/LLC4320Bot/status/1274862822778982402
    Oceanography Microscopy

    View Slide

  5. What is Zarr?
    An open-source format for the storage of

    chunked, compressed, N-dimensional arrays.
    zarr.readthedocs.io
    • Zarr began in 2016 as a storage library for the

    scientific python ecosystem.

    • Integrates closely with Numpy, Xarray, and Dask.
    zarr-developers/zarr-python

    View Slide

  6. What is Zarr?
    An open-source format for the storage of

    chunked, compressed, N-dimensional arrays.
    zarr-developers/zarr-python constantinpape/z5
    bcdev/jzarr meggart/Zarr.jl
    gzuidhof/zarr.js/
    Open spec (V2) has led to implementations in several language.

    View Slide

  7. What is Zarr?
    An open-source format for the storage of

    chunked, compressed, N-dimensional arrays.
    Zarr Group:
    .zgroup .zattrs
    .zarray .zattrs
    Zarr Array: foo
    0.0 0.1
    2.0
    1.0 1.1
    2.1
    $ cat .zarray

    {
    "dtype": ""fill_value": "NaN",

    “chunks": [
    5,
    720,
    1440
    ],
    "compressor": {
    "blocksize": 0,
    "clevel": 3,
    "cname": "zstd",
    "id": "blosc",
    "shuffle": 2
    },
    "filters": null,
    "order": "C",
    "shape": [
    8901,
    720,
    1440
    ],
    "zarr_format": 2
    }
    Metadata (json)
    Chunk data (binary)

    View Slide

  8. What is Zarr?
    An open-source format for the storage of

    chunked, compressed, N-dimensional arrays.
    Zarr Group:
    .zgroup .zattrs
    .zarray .zattrs
    Zarr Array: foo
    0.0 0.1
    2.0
    1.0 1.1
    2.1
    Zarr can be stored in any key-value store.
    • Directory Store

    • ZipFile Store

    • Cloud object storage (e.g. S3, GCS)

    • Database (e.g. Redis, MongoDB)

    • Get creative—Zarr is designed to be hackable!

    View Slide

  9. CMIP6 Google Cloud Public Dataset
    Zarr in Action
    • Climate Model Intercomparison
    Project: latest climate model
    projections from modeling centers
    around the world

    • Pangeo project worked with Google
    Cloud to mirror data from ESGF

    • Zarr was chosen as format because
    of its interoperability with cloud
    object storage

    • 600 TB and growing
    https://cloud.google.com/blog/products/data-analytics/new-climate-model-data-now-google-public-datasets

    View Slide

  10. Zarr and OGC
    • We are seeking OGC member approval of Zarr V2 spec as an OGC Community
    Standard:


    “Community Standard serves to bring de facto standards from the larger
    geospatial community to be a stable reference point that can normatively
    referenced by governments and other organizations.”

    • Zarr is generic (comparable to HDF5) but can serve as a base format for other
    OGC standards, e.g. web coverage service.

    • Unidata will soon release a version of NetCDF with support for Zarr as an
    underlying storage container.

    • Zarr project received and EOSS Grant from the Chan Zuckerberg Institute to
    support development of V3 spec. Get involved @ zarr-developers/zarr-specs/

    View Slide