Upgrade to Pro — share decks privately, control downloads, hide ads and more …

xGCM: Staggered grids, topologies, and ufuncs in python

xGCM: Staggered grids, topologies, and ufuncs in python

(This talk was given at the AMS conference in Denver 2023)

Staggered grids such as Arakawa grids are ubiquitous in climate models. Analysis and post-processing tools using finite-volume operators must respect staggering to get correct results. This presents a challenge for users of General Circulation Model (GCM) data, whose analysis routines must follow the idiosyncrasies of particular GCM grids.

The xGCM package [1] is designed to solve this problem, by extending xarray’s data model with information about the grid. It encodes the positions of variables along each axis of the grid, so that operations can respect differences in staggering between variables. It also encodes the topology of the grid, understanding how the spherical Earth is divided into different regions in various models.

XGCM has recently been upgraded by introducing the concept of “Grid Ufuncs”, which are analogous to numpy ufuncs but grid-aware. Users can define their own grid ufuncs, then apply them to their data. We hope that this extensible model can be built upon by scientists who work post-processing all types of climate models.

We present an overview of xGCM and its capabilities, before showing an example of using it on a multi-TB scale oceanographic dataset.

[1] https://github.com/xgcm/xgcm

Tom Nicholas

January 11, 2023
Tweet

More Decks by Tom Nicholas

Other Decks in Programming

Transcript

  1. xGCM: Staggered grids, topologies, and grid ufuncs in python Thomas

    Nicholas*, Julius Busecke, Ryan Abernathey 1 *[email protected] *github.com/TomNicholas
  2. • Oceanographer (ex-plasma physicist) Who am I? 2 • Xarray

    core dev & Pangeo user • Both these fields have variety of big turbulence models producing gridded data
  3. • 3 problems with model grids ◦ How xGCM concepts

    help • Grid ufuncs for arbitrary operations • Will it scale? Talk overview 3
  4. • Fluid variables live on “Arakawa Grids” • Variables’ positions

    are offset • Finite-volume calculations must account for this to get correct results Grid problem #1: Staggered grids 4
  5. • xGCM handles staggered variables • Extends xarray’s data model

    with Axes and Grid objects • Variables may live on different positions along xgcm.Axes • Axes stored in Grid object Staggered grids in xGCM 5 github.com/xgcm/xgcm
  6. • Basic calculus operations .diff, .interp etc. • Uses correct

    numerical scheme for grid position! • Consumes and produces xarray objects Features: calculus operations 9
  7. • Wrap numpy ufuncs to be grid-aware • Positions specified

    through “signature” ◦ “(X:left)->(X:center)” • Signature is property of computational function ◦ i.e. language-agnostic idea New idea: “grid ufuncs” 10
  8. • Allows custom ufuncs ◦ User-specific algorithms (e.g. from climate

    model) ◦ Can auto-dispatch to correct ufunc for data • Can specify grid positions via annotated type hints • Could chain with other decorators, e.g. numba.jit @as_grid_ufunc decorator 11
  9. • xGCM’s finite-volume functions, e.g. diff, interp • Require padding

    to apply boundary conditions • Previously used custom reduction code • Chaining diff, interp etc. led to explosion of dask tasks Will it scale? Originally no… 12
  10. • Apply all grid ufuncs through xarray.apply_ufunc ◦ Common code

    path for all functions • Only pad once ◦ Avoids task explosion • Creates minimal dask graph • Ex. reduction: Almost blockwise (+ a rechunk-merge operation after padding) Dask-optimised xGCM via xarray.apply_ufunc 13
  11. • Tried it on huge global ocean simulation dataset •

    Single variable is 8.76 TB ▪ 117390 Zarr chunks Now does it scale? 14 • But saw dreaded orange bars! • Nice task graph, but dask using way too much RAM!
  12. 15

  13. • Exposed a bug in dask.distributed • Gabe Joseph of

    Coiled fixed it! • See pangeo blog post Dask needed fixing! 16
  14. • GCM grids are a pain • But xGCM can

    help! • Grid ufuncs are a cool idea • We improved dask along the way! Summary 17 github.com/xgcm/xgcm P.S. I am looking for my next big project 😁