$30 off During Our Annual Pro Sale. View Details »

xGCM: Staggered grids, topologies, and ufuncs in python

xGCM: Staggered grids, topologies, and ufuncs in python

Staggered grids such as Arakawa grids are ubiquitous in climate models. Analysis and post-processing tools using finite-volume operators must respect staggering to get correct results. This presents a challenge for users of General Circulation Model (GCM) data, whose analysis routines must follow the idiosyncrasies of particular GCM grids.

The xGCM package [1] is designed to solve this problem, by extending xarray’s data model with information about the grid. It encodes the positions of variables along each axis of the grid, so that operations can respect differences in staggering between variables. It also encodes the topology of the grid, understanding how the spherical Earth is divided into different regions in various models.

XGCM has recently been upgraded by introducing the concept of “Grid Ufuncs”, which are analogous to numpy ufuncs but grid-aware. Users can define their own grid ufuncs, then apply them to their data. We hope that this extensible model can be built upon by scientists who work post-processing all types of climate models.

We present an overview of xGCM and its capabilities, before showing an example of using it on a multi-TB scale oceanographic dataset.

[1] https://github.com/xgcm/xgcm

Tom Nicholas

January 11, 2023
Tweet

More Decks by Tom Nicholas

Other Decks in Programming

Transcript

  1. xGCM: Staggered grids, topologies, and
    grid ufuncs in python
    Thomas Nicholas*,
    Julius Busecke,
    Ryan Abernathey
    1
    *[email protected]
    *github.com/TomNicholas

    View Slide

  2. ● Oceanographer (ex-plasma
    physicist)
    Who am I?
    2
    ● Xarray core dev &
    Pangeo user
    ● Both these fields have
    variety of big turbulence
    models producing gridded
    data

    View Slide

  3. ● 3 problems with model grids
    ○ How xGCM concepts help
    ● Grid ufuncs for arbitrary operations
    ● Will it scale?
    Talk overview
    3

    View Slide

  4. ● Fluid variables live on
    “Arakawa Grids”
    ● Variables’ positions are
    offset
    ● Finite-volume calculations
    must account for this to
    get correct results
    Grid problem #1: Staggered grids
    4

    View Slide

  5. ● xGCM handles staggered
    variables
    ● Extends xarray’s data model
    with Axes and Grid objects
    ● Variables may live on
    different positions along
    xgcm.Axes
    ● Axes stored in Grid object
    Staggered grids in xGCM
    5
    github.com/xgcm/xgcm

    View Slide

  6. Grid problem #2: Topologies
    6

    View Slide

  7. Topologies in xGCM
    7

    View Slide

  8. Grid problem #3: Vertical coordinates
    8

    View Slide

  9. ● Basic calculus operations
    .diff, .interp etc.
    ● Uses correct numerical
    scheme for grid position!
    ● Consumes and produces
    xarray objects
    Features: calculus operations
    9

    View Slide

  10. ● Wrap numpy ufuncs to be
    grid-aware
    ● Positions specified through
    “signature”
    ○ “(X:left)->(X:center)”
    ● Signature is property of
    computational function
    ○ i.e. language-agnostic idea
    New idea: “grid ufuncs”
    10

    View Slide

  11. ● Allows custom ufuncs
    ○ User-specific algorithms
    (e.g. from climate model)
    ○ Can auto-dispatch to
    correct ufunc for data
    ● Can specify grid positions
    via annotated type hints
    ● Could chain with other
    decorators, e.g. numba.jit
    @as_grid_ufunc decorator
    11

    View Slide

  12. ● xGCM’s finite-volume
    functions, e.g. diff, interp
    ● Require padding to apply
    boundary conditions
    ● Previously used custom
    reduction code
    ● Chaining diff, interp etc. led
    to explosion of dask tasks
    Will it scale? Originally no…
    12

    View Slide

  13. ● Apply all grid ufuncs through
    xarray.apply_ufunc
    ○ Common code path for all functions
    ● Only pad once
    ○ Avoids task explosion
    ● Creates minimal dask graph
    ● Ex. reduction: Almost blockwise (+ a
    rechunk-merge operation after padding)
    Dask-optimised xGCM via xarray.apply_ufunc
    13

    View Slide

  14. ● Tried it on huge global
    ocean simulation dataset
    ● Single variable is 8.76 TB
    ■ 117390 Zarr chunks
    Now does it scale?
    14
    ● But saw dreaded orange
    bars!
    ● Nice task graph, but dask using way too much RAM!

    View Slide

  15. 15

    View Slide

  16. ● Exposed a bug in
    dask.distributed
    ● Gabe Joseph of Coiled
    fixed it!
    ● See pangeo blog post
    Dask needed fixing!
    16

    View Slide

  17. ● GCM grids are a pain
    ● But xGCM can help!
    ● Grid ufuncs are a cool idea
    ● We improved dask along
    the way!
    Summary
    17
    github.com/xgcm/xgcm
    P.S. I am looking for my
    next big project 😁

    View Slide