Rich Signell (USGS) Anderson Banihirwe (NCAR) Ryan Abernathy (Columbia) Joe Hamman (NCAR) Matthew Rocklin (Anaconda->NVIDIA->Coiled Computing) Niall Robinson (UK Met Office Informatics Lab) Jacob Tomlinson (UK Met Office->NVIDIA) Scott Henderson (UW) and the rest of the Pangeo Community! Dask Developer Workshop - Feb 27, 2020
Xarray variables can include dask arrays ◦ map_blocks allows xarray objects to be the primary dask collections • High-level metadata-aware interfaces to dask: ◦ xr.apply_ufunc() ◦ xr.map_blocks() • File I/O: Dask allows xarray to support parallel read and write functionality via its open_mfdataset(), to_netcdf(), open_zarr(), to_zarr().
with dask (work around using xr.to_zarr(append=True) Memory Backpressure issue (D.E Shaw’s graph manipulation tools!) • Dask-cloudprovider very attractive to orgs like USGS: FargateCluster “rate exceeded” issue • Community understanding of chunking impact on use • Dask Performance challenges, e.g. pangeo/#194, dask/#3595 ◦ More work on graph optimization, high-level graphs, task-fusion, etc.. • Dask-deployment: More work on enabling heterogeneous worker pools, harmonization among systems, etc...