Slide 1

Slide 1 text

So what is in the CMIP archive? Model A Model B Model C Member i Member j Member k Slightly di ff erent initial conditions, physical parameter … 30+ for some models 88 models - Di ff erent code, grids, physical/biological/chemical parameterizations

Slide 2

Slide 2 text

Typical CMIP6 science workflow Custom Analysis applied to each model and member

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

But CMIP6 is not quite clean enough

Slide 5

Slide 5 text

😖

Slide 6

Slide 6 text

😖 😎

Slide 7

Slide 7 text

Why cmip6_preprocessing? Based on the xarray datamodel Labelled arrays and datasets - ‘pandas for n-dimensional arrays’ Integrates with the existing pangeo stack where possible (regionmask, xgcm, xesmf, pint-coming soon) Lightweight, dask friendly Works great in the cloud, but also locally, on HPC...wherever you have xarray really.

Slide 8

Slide 8 text

Components `preprocessing` for dataset cleaning/homogenization `postprocessing` to combine the model data into xarray datasets with e.g. several members or combined historical+forced experiments `regionmask` create basin masks for arbitrary model output `drift_removal` to compute trends for preindustrial control runs and remove that drift from other output.

Slide 9

Slide 9 text

Demo

Slide 10

Slide 10 text

I ❤ Feedback, questions, contributions. Docs Github @JuliusBusecke jbusecke [email protected]