Slide 1

Slide 1 text

xGCM: Staggered grids, topologies, and grid ufuncs in python Thomas Nicholas*, Julius Busecke, Ryan Abernathey 1 *[email protected] *github.com/TomNicholas

Slide 2

Slide 2 text

● Oceanographer (ex-plasma physicist) Who am I? 2 ● Xarray core dev & Pangeo user ● Both these fields have variety of big turbulence models producing gridded data

Slide 3

Slide 3 text

● 3 problems with model grids ○ How xGCM concepts help ● Grid ufuncs for arbitrary operations ● Will it scale? Talk overview 3

Slide 4

Slide 4 text

● Fluid variables live on “Arakawa Grids” ● Variables’ positions are offset ● Finite-volume calculations must account for this to get correct results Grid problem #1: Staggered grids 4

Slide 5

Slide 5 text

● xGCM handles staggered variables ● Extends xarray’s data model with Axes and Grid objects ● Variables may live on different positions along xgcm.Axes ● Axes stored in Grid object Staggered grids in xGCM 5 github.com/xgcm/xgcm

Slide 6

Slide 6 text

Grid problem #2: Topologies 6

Slide 7

Slide 7 text

Topologies in xGCM 7

Slide 8

Slide 8 text

Grid problem #3: Vertical coordinates 8

Slide 9

Slide 9 text

● Basic calculus operations .diff, .interp etc. ● Uses correct numerical scheme for grid position! ● Consumes and produces xarray objects Features: calculus operations 9

Slide 10

Slide 10 text

● Wrap numpy ufuncs to be grid-aware ● Positions specified through “signature” ○ “(X:left)->(X:center)” ● Signature is property of computational function ○ i.e. language-agnostic idea New idea: “grid ufuncs” 10

Slide 11

Slide 11 text

● Allows custom ufuncs ○ User-specific algorithms (e.g. from climate model) ○ Can auto-dispatch to correct ufunc for data ● Can specify grid positions via annotated type hints ● Could chain with other decorators, e.g. numba.jit @as_grid_ufunc decorator 11

Slide 12

Slide 12 text

● xGCM’s finite-volume functions, e.g. diff, interp ● Require padding to apply boundary conditions ● Previously used custom reduction code ● Chaining diff, interp etc. led to explosion of dask tasks Will it scale? Originally no… 12

Slide 13

Slide 13 text

● Apply all grid ufuncs through xarray.apply_ufunc ○ Common code path for all functions ● Only pad once ○ Avoids task explosion ● Creates minimal dask graph ● Ex. reduction: Almost blockwise (+ a rechunk-merge operation after padding) Dask-optimised xGCM via xarray.apply_ufunc 13

Slide 14

Slide 14 text

● Tried it on huge global ocean simulation dataset ● Single variable is 8.76 TB ■ 117390 Zarr chunks Now does it scale? 14 ● But saw dreaded orange bars! ● Nice task graph, but dask using way too much RAM!

Slide 15

Slide 15 text

15

Slide 16

Slide 16 text

● Exposed a bug in dask.distributed ● Gabe Joseph of Coiled fixed it! ● See pangeo blog post Dask needed fixing! 16

Slide 17

Slide 17 text

● GCM grids are a pain ● But xGCM can help! ● Grid ufuncs are a cool idea ● We improved dask along the way! Summary 17 github.com/xgcm/xgcm P.S. I am looking for my next big project 😁