Upgrade to Pro — share decks privately, control downloads, hide ads and more …

"Dask and the ocean death zones" - Lessons from a real life earth science workflow with a ‘fullish’ pangeo stack

"Dask and the ocean death zones" - Lessons from a real life earth science workflow with a ‘fullish’ pangeo stack

Presentation from the Dask Distributed Summit 2021 (https://summit.dask.org/schedule/presentation/1/pangeo/)

D2e1a0c3dff341090fe80c05330c6a12?s=128

Julius Busecke

May 19, 2021
Tweet

More Decks by Julius Busecke

Other Decks in Science

Transcript

  1. Julius Busecke \\ Columbia University Dask and the ocean death

    zones Lessons from a real life earth science work f low with a ‘fullish’ pangeo stack
  2. Oxygen Minimum Zones (OMZs) in the global ocean

  3. GROWING OXYGEN MINIMUM ZONES IMPACT BOTH LOCAL ECOSYSTEMS AND THE

    GLOBAL CLIMATE
  4. The OMZ expands into the thermocline Preprint Link Busecke et

    al., submitted to AGU Advances Thermocline Intermediate Waters Deep
  5. The OMZ expands into the thermocline Preprint Link Busecke et

    al., submitted to AGU Advances Thermocline Intermediate Waters Deep
  6. Density Framework • Dynamically consistent vertical coordinates are key! Depth

    Depth Coordinates Density Coordinates Potential Density
  7. Density Framework • Dynamically consistent vertical coordinates are key! •

    xgcm handles this task very e ff iciently. 🙏 numba + dask + xarray Depth Depth Coordinates Density Coordinates Potential Density
  8. Wait, this was not complicated!

  9. The full stack work f low Transform into density coordinates

  10. The full stack work f low Model output Clean up

    naming/metadata etc Remove Control Run Drift Combine variables Interpolate Grids Calculate additional variables (potential density, apparent oxygen utilization, etc.) Remove mixed layer + bottom values Add/recalculate grid metrics Transform into density coordinates
  11. Model output Clean up naming/metadata etc Remove Control Run Drift

    Add/recalculate grid metrics The full stack work f low
  12. Model output Clean up naming/metadata etc Remove Control Run Drift

    Add/recalculate grid metrics The full stack work f low
  13. Model output Clean up naming/metadata etc Remove Control Run Drift

    Add/recalculate grid metrics The full stack work f low
  14. The full stack work f low Model output Clean up

    naming/metadata etc Remove Control Run Drift Combine variables Interpolate Grids Calculate additional variables (potential density, apparent oxygen utilization, etc.) Remove mixed layer + bottom values Add/recalculate grid metrics Transform into density coordinates
  15. Combine variables Calculate additional variables (potential density, apparent oxygen utilization,

    etc.) Remove mixed layer + bottom values Interpolate Grids Credit: Raphael Dussin The full stack work f low
  16. Combine variables Calculate additional variables (potential density, apparent oxygen utilization,

    etc.) Remove mixed layer + bottom values Interpolate Grids Mixed Layer Masked Full data The full stack work f low
  17. Combine variables Calculate additional variables (potential density, apparent oxygen utilization,

    etc.) Remove mixed layer + bottom values Interpolate Grids The full stack work f low
  18. The full stack work f low Model output Clean up

    naming/metadata etc Remove Control Run Drift Combine variables Interpolate Grids Calculate additional variables (potential density, apparent oxygen utilization, etc.) Remove mixed layer + bottom values Add/recalculate grid metrics Transform into density coordinates
  19. .compute()

  20. .compute()

  21. .compute()

  22. .compute() In the end I had to write out small

    time steps in a for loop 😭. Not pretty but it worked 😬
  23. What have I learned?

  24. You are all awesome!

  25. HPC Cloud Laptop

  26. 😍 HPC Cloud Laptop

  27. Dont combine datasets if you don’t have to!

  28. None
  29. None
  30. Persistent Pain Points Maybe not “embarrassingly” parallel, but it “should

    still work”. Not able to pin down a single step in the pipeline that provokes failure. Its usually just the “whole thing” that fails. Using dask vs understanding dask. Can the transition be easier for xarray daskers?
  31. Going Forward

  32. Lets get more science work f lows in the cloud!

    Going Forward
  33. Lets get more science work f lows in the cloud!

    • Use-cases for complex dask work f lows Going Forward
  34. Lets get more science work f lows in the cloud!

    • Use-cases for complex dask work f lows • Awesome for reproducible science. Going Forward
  35. Lets get more science work f lows in the cloud!

    • Use-cases for complex dask work f lows • Awesome for reproducible science. • Need: Derived data products in the cloud. Going Forward
  36. Interested in more of the science? Check out our preprint!

    Interested in working with CMIP6 data? PANGEO-CMIP6 Questions? Comments? Feedback? Let me know! @JuliusBusecke jbusecke julius@ldeo.columbia.edu