xGCM: Staggered grids, topologies, and
grid ufuncs in python
Thomas Nicholas*,
Julius Busecke,
Ryan Abernathey
1
*thomas.nicholas@columbia.edu
*github.com/TomNicholas
Slide 2
Slide 2 text
● Oceanographer (ex-plasma
physicist)
Who am I?
2
● Xarray core dev &
Pangeo user
● Both these fields have
variety of big turbulence
models producing gridded
data
Slide 3
Slide 3 text
● 3 problems with model grids
○ How xGCM concepts help
● Grid ufuncs for arbitrary operations
● Will it scale?
Talk overview
3
Slide 4
Slide 4 text
● Fluid variables live on
“Arakawa Grids”
● Variables’ positions are
offset
● Finite-volume calculations
must account for this to
get correct results
Grid problem #1: Staggered grids
4
Slide 5
Slide 5 text
● xGCM handles staggered
variables
● Extends xarray’s data model
with Axes and Grid objects
● Variables may live on
different positions along
xgcm.Axes
● Axes stored in Grid object
Staggered grids in xGCM
5
github.com/xgcm/xgcm
Slide 6
Slide 6 text
Grid problem #2: Topologies
6
Slide 7
Slide 7 text
Topologies in xGCM
7
Slide 8
Slide 8 text
Grid problem #3: Vertical coordinates
8
Slide 9
Slide 9 text
● Basic calculus operations
.diff, .interp etc.
● Uses correct numerical
scheme for grid position!
● Consumes and produces
xarray objects
Features: calculus operations
9
Slide 10
Slide 10 text
● Wrap numpy ufuncs to be
grid-aware
● Positions specified through
“signature”
○ “(X:left)->(X:center)”
● Signature is property of
computational function
○ i.e. language-agnostic idea
New idea: “grid ufuncs”
10
Slide 11
Slide 11 text
● Allows custom ufuncs
○ User-specific algorithms
(e.g. from climate model)
○ Can auto-dispatch to
correct ufunc for data
● Can specify grid positions
via annotated type hints
● Could chain with other
decorators, e.g. numba.jit
@as_grid_ufunc decorator
11
Slide 12
Slide 12 text
● xGCM’s finite-volume
functions, e.g. diff, interp
● Require padding to apply
boundary conditions
● Previously used custom
reduction code
● Chaining diff, interp etc. led
to explosion of dask tasks
Will it scale? Originally no…
12
Slide 13
Slide 13 text
● Apply all grid ufuncs through
xarray.apply_ufunc
○ Common code path for all functions
● Only pad once
○ Avoids task explosion
● Creates minimal dask graph
● Ex. reduction: Almost blockwise (+ a
rechunk-merge operation after padding)
Dask-optimised xGCM via xarray.apply_ufunc
13
Slide 14
Slide 14 text
● Tried it on huge global
ocean simulation dataset
● Single variable is 8.76 TB
■ 117390 Zarr chunks
Now does it scale?
14
● But saw dreaded orange
bars!
● Nice task graph, but dask using way too much RAM!
Slide 15
Slide 15 text
15
Slide 16
Slide 16 text
● Exposed a bug in
dask.distributed
● Gabe Joseph of Coiled
fixed it!
● See pangeo blog post
Dask needed fixing!
16
Slide 17
Slide 17 text
● GCM grids are a pain
● But xGCM can help!
● Grid ufuncs are a cool idea
● We improved dask along
the way!
Summary
17
github.com/xgcm/xgcm
P.S. I am looking for my
next big project 😁