Slide 1

Slide 1 text

Julius Busecke \\ Columbia University Dask and the ocean death zones Lessons from a real life earth science work f low with a ‘fullish’ pangeo stack

Slide 2

Slide 2 text

Oxygen Minimum Zones (OMZs) in the global ocean

Slide 3

Slide 3 text

GROWING OXYGEN MINIMUM ZONES IMPACT BOTH LOCAL ECOSYSTEMS AND THE GLOBAL CLIMATE

Slide 4

Slide 4 text

The OMZ expands into the thermocline Preprint Link Busecke et al., submitted to AGU Advances Thermocline Intermediate Waters Deep

Slide 5

Slide 5 text

The OMZ expands into the thermocline Preprint Link Busecke et al., submitted to AGU Advances Thermocline Intermediate Waters Deep

Slide 6

Slide 6 text

Density Framework • Dynamically consistent vertical coordinates are key! Depth Depth Coordinates Density Coordinates Potential Density

Slide 7

Slide 7 text

Density Framework • Dynamically consistent vertical coordinates are key! • xgcm handles this task very e ff iciently. 🙏 numba + dask + xarray Depth Depth Coordinates Density Coordinates Potential Density

Slide 8

Slide 8 text

Wait, this was not complicated!

Slide 9

Slide 9 text

The full stack work f low Transform into density coordinates

Slide 10

Slide 10 text

The full stack work f low Model output Clean up naming/metadata etc Remove Control Run Drift Combine variables Interpolate Grids Calculate additional variables (potential density, apparent oxygen utilization, etc.) Remove mixed layer + bottom values Add/recalculate grid metrics Transform into density coordinates

Slide 11

Slide 11 text

Model output Clean up naming/metadata etc Remove Control Run Drift Add/recalculate grid metrics The full stack work f low

Slide 12

Slide 12 text

Model output Clean up naming/metadata etc Remove Control Run Drift Add/recalculate grid metrics The full stack work f low

Slide 13

Slide 13 text

Model output Clean up naming/metadata etc Remove Control Run Drift Add/recalculate grid metrics The full stack work f low

Slide 14

Slide 14 text

The full stack work f low Model output Clean up naming/metadata etc Remove Control Run Drift Combine variables Interpolate Grids Calculate additional variables (potential density, apparent oxygen utilization, etc.) Remove mixed layer + bottom values Add/recalculate grid metrics Transform into density coordinates

Slide 15

Slide 15 text

Combine variables Calculate additional variables (potential density, apparent oxygen utilization, etc.) Remove mixed layer + bottom values Interpolate Grids Credit: Raphael Dussin The full stack work f low

Slide 16

Slide 16 text

Combine variables Calculate additional variables (potential density, apparent oxygen utilization, etc.) Remove mixed layer + bottom values Interpolate Grids Mixed Layer Masked Full data The full stack work f low

Slide 17

Slide 17 text

Combine variables Calculate additional variables (potential density, apparent oxygen utilization, etc.) Remove mixed layer + bottom values Interpolate Grids The full stack work f low

Slide 18

Slide 18 text

The full stack work f low Model output Clean up naming/metadata etc Remove Control Run Drift Combine variables Interpolate Grids Calculate additional variables (potential density, apparent oxygen utilization, etc.) Remove mixed layer + bottom values Add/recalculate grid metrics Transform into density coordinates

Slide 19

Slide 19 text

.compute()

Slide 20

Slide 20 text

.compute()

Slide 21

Slide 21 text

.compute()

Slide 22

Slide 22 text

.compute() In the end I had to write out small time steps in a for loop 😭. Not pretty but it worked 😬

Slide 23

Slide 23 text

What have I learned?

Slide 24

Slide 24 text

You are all awesome!

Slide 25

Slide 25 text

HPC Cloud Laptop

Slide 26

Slide 26 text

😍 HPC Cloud Laptop

Slide 27

Slide 27 text

Dont combine datasets if you don’t have to!

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

Persistent Pain Points Maybe not “embarrassingly” parallel, but it “should still work”. Not able to pin down a single step in the pipeline that provokes failure. Its usually just the “whole thing” that fails. Using dask vs understanding dask. Can the transition be easier for xarray daskers?

Slide 31

Slide 31 text

Going Forward

Slide 32

Slide 32 text

Lets get more science work f lows in the cloud! Going Forward

Slide 33

Slide 33 text

Lets get more science work f lows in the cloud! • Use-cases for complex dask work f lows Going Forward

Slide 34

Slide 34 text

Lets get more science work f lows in the cloud! • Use-cases for complex dask work f lows • Awesome for reproducible science. Going Forward

Slide 35

Slide 35 text

Lets get more science work f lows in the cloud! • Use-cases for complex dask work f lows • Awesome for reproducible science. • Need: Derived data products in the cloud. Going Forward

Slide 36

Slide 36 text

Interested in more of the science? Check out our preprint! Interested in working with CMIP6 data? PANGEO-CMIP6 Questions? Comments? Feedback? Let me know! @JuliusBusecke jbusecke [email protected]