Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Xarray flexible indexes

Xarray flexible indexes

Xarray User Forum / Dask Summit 2021

Benoît Bovy

May 26, 2021
Tweet

More Decks by Benoît Bovy

Other Decks in Programming

Transcript

  1. Other (domain-speci fi c) cases • Geospatial data: Coordinate Reference

    System (CRS) pd.Index does not make any assumption about CRS • Staggered grids (cell centers vs. cell edges) • Any other case? Let’s hear from you! (src: http://thevisualroom.com) (src: QGIS documentation)
  2. “Explicit” indexes Make indexes 1st-class citizens of the Xarray data

    model Indexes: lat lon time Float64Index Float64Index DatetimeIndex
  3. “Flexible” indexes Provide xarray.Index API (data selection, alignment… more?) +

    extension mechanism (e.g., entrypoints) Indexes: x, y KDTreeIndex An index may be built from several coordinates (possibly also from multi-dimension coordinates and/or coordinates with different dimensions)
  4. this won’t be needed anymore with Xarray fl exible indexes

    Select nearest-neighbors (1D query point dataset)
  5. Experimental Dask support (chunked point coordinates) 1st stage: “map” index

    lookup query points chunks Index points chunks index<1> .query()
  6. Experimental Dask support (chunked point coordinates) 2nd stage: “reduce” brute-force

    lookup query points chunks Index points chunks dask.array.argmin
  7. Experimental Dask support (chunked point coordinates) It works well, sometimes…

    …but often fails (miserably) It is challenging! - Chunk size matters - Dask inter-worker communication - Indexes are often complex objects - C/C++ native & dynamic data structures - Memory footprint? - Serialization?