Upgrade to Pro — share decks privately, control downloads, hide ads and more …

"Dask and the ocean death zones" - Lessons from a real life earth science workflow with a ‘fullish’ pangeo stack

"Dask and the ocean death zones" - Lessons from a real life earth science workflow with a ‘fullish’ pangeo stack

Presentation from the Dask Distributed Summit 2021 (https://summit.dask.org/schedule/presentation/1/pangeo/)

Julius Busecke

May 19, 2021
Tweet

More Decks by Julius Busecke

Other Decks in Science

Transcript

  1. Julius Busecke \\ Columbia University Dask and the ocean death

    zones Lessons from a real life earth science work f low with a ‘fullish’ pangeo stack
  2. The OMZ expands into the thermocline Preprint Link Busecke et

    al., submitted to AGU Advances Thermocline Intermediate Waters Deep
  3. The OMZ expands into the thermocline Preprint Link Busecke et

    al., submitted to AGU Advances Thermocline Intermediate Waters Deep
  4. Density Framework • Dynamically consistent vertical coordinates are key! Depth

    Depth Coordinates Density Coordinates Potential Density
  5. Density Framework • Dynamically consistent vertical coordinates are key! •

    xgcm handles this task very e ff iciently. 🙏 numba + dask + xarray Depth Depth Coordinates Density Coordinates Potential Density
  6. The full stack work f low Model output Clean up

    naming/metadata etc Remove Control Run Drift Combine variables Interpolate Grids Calculate additional variables (potential density, apparent oxygen utilization, etc.) Remove mixed layer + bottom values Add/recalculate grid metrics Transform into density coordinates
  7. Model output Clean up naming/metadata etc Remove Control Run Drift

    Add/recalculate grid metrics The full stack work f low
  8. Model output Clean up naming/metadata etc Remove Control Run Drift

    Add/recalculate grid metrics The full stack work f low
  9. Model output Clean up naming/metadata etc Remove Control Run Drift

    Add/recalculate grid metrics The full stack work f low
  10. The full stack work f low Model output Clean up

    naming/metadata etc Remove Control Run Drift Combine variables Interpolate Grids Calculate additional variables (potential density, apparent oxygen utilization, etc.) Remove mixed layer + bottom values Add/recalculate grid metrics Transform into density coordinates
  11. Combine variables Calculate additional variables (potential density, apparent oxygen utilization,

    etc.) Remove mixed layer + bottom values Interpolate Grids Credit: Raphael Dussin The full stack work f low
  12. Combine variables Calculate additional variables (potential density, apparent oxygen utilization,

    etc.) Remove mixed layer + bottom values Interpolate Grids Mixed Layer Masked Full data The full stack work f low
  13. Combine variables Calculate additional variables (potential density, apparent oxygen utilization,

    etc.) Remove mixed layer + bottom values Interpolate Grids The full stack work f low
  14. The full stack work f low Model output Clean up

    naming/metadata etc Remove Control Run Drift Combine variables Interpolate Grids Calculate additional variables (potential density, apparent oxygen utilization, etc.) Remove mixed layer + bottom values Add/recalculate grid metrics Transform into density coordinates
  15. .compute() In the end I had to write out small

    time steps in a for loop 😭. Not pretty but it worked 😬
  16. Persistent Pain Points Maybe not “embarrassingly” parallel, but it “should

    still work”. Not able to pin down a single step in the pipeline that provokes failure. Its usually just the “whole thing” that fails. Using dask vs understanding dask. Can the transition be easier for xarray daskers?
  17. Lets get more science work f lows in the cloud!

    • Use-cases for complex dask work f lows Going Forward
  18. Lets get more science work f lows in the cloud!

    • Use-cases for complex dask work f lows • Awesome for reproducible science. Going Forward
  19. Lets get more science work f lows in the cloud!

    • Use-cases for complex dask work f lows • Awesome for reproducible science. • Need: Derived data products in the cloud. Going Forward
  20. Interested in more of the science? Check out our preprint!

    Interested in working with CMIP6 data? PANGEO-CMIP6 Questions? Comments? Feedback? Let me know! @JuliusBusecke jbusecke [email protected]