Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Earthcube 2020 - CMIP6 without the interpolation: Grid-native analysis with Pangeo in the cloud

Earthcube 2020 - CMIP6 without the interpolation: Grid-native analysis with Pangeo in the cloud

The Pangeo project recently introduced large parts of the CMIP6 data archive into the cloud. This enables, for the first time, centralized, reproducible science of state-of-the-art climate simulations without the need to own large storage or a supercomputer as a user. The data itself however, still presents significant challenges for analysis, one of which is applying operations across many models. Two of the major hurdles are different naming/metadata between modeling centers and complex grid layouts, particularly for the ocean components of climate models. Common workflows in the past often included interpolating/remapping desired variables and creating new files, creating organizational burden, and increasing storage requirements. We will demonstrate two Pangeo tools which enable seamless calculation of common operations like vector calculus operators (grad, curl, div) and weighted averages/integrals across a wide range of CMIP6 models directly on the data stored in the cloud. cmip6_preprocessing provides numerous tools to unify naming conventions and parse grid information and metrics (like cell area). This information is used by xgcm to enable finite volume analysis on the native model grids. The combination of both tools facilitates fast analysis while ensuring a reproducible and accurate workflow.

Julius Busecke

June 18, 2020
Tweet

More Decks by Julius Busecke

Other Decks in Science

Transcript

  1. SLOW AND STORAGE INTENSIVE TYPICAL WORKFLOW Download Data ⏰ Apply

    Analysis Discovery Regrid Data ⏰ Homogenize Data ⏰
  2. SLOW AND STORAGE INTENSIVE TYPICAL WORKFLOW Download Data ⏰ Apply

    Analysis Discovery Regrid Data ⏰ Homogenize Data ⏰ Pangeo’s CMIP6 Google Cloud Public Dataset (~600TB)
  3. SLOW AND STORAGE INTENSIVE TYPICAL WORKFLOW Download Data ⏰ Apply

    Analysis Discovery Regrid Data ⏰ Homogenize Data ⏰ Pangeo’s CMIP6 Google Cloud Public Dataset (~600TB)
  4. CURVILINEAR GRIDS DELANDMETER AND VAN SEBILLE, 2019 X0 , Y0

    X1 , Y1 X2 , Y2 X3 , Y3 u0 u1 v0 v1 T0 F0 F1 F2 F3 x,y x y ⇠ ⌘ 0, 0 1, 0 1, 1 0, 1 F0 F1 F2 F3 U0 U1 V0 V1 T0 ⇠,⌘ (a) (b) ENES.ORG
  5. COMPUTING A DERIVATIVE XGCM DELANDMETER AND VAN SEBILLE, 2019 u,

    v, T u v T & ` ' A ep g b j 1 j 1 j j j + 1 j + 1 j + 2 i 1 i 1 i i i + 1 i + 1 Axis:Y axis:X
  6. COMPUTING A DERIVATIVE XGCM DELANDMETER AND VAN SEBILLE, 2019 u,

    v, T u v T & ` ' A ep g b j 1 j 1 j j j + 1 j + 1 j + 2 i 1 i 1 i i i + 1 i + 1 Axis:Y axis:X
  7. COMPUTING A DERIVATIVE XGCM DELANDMETER AND VAN SEBILLE, 2019 u,

    v, T u v T & ` ' A ep g b j 1 j 1 j j j + 1 j + 1 j + 2 i 1 i 1 i i i + 1 i + 1 Axis:Y axis:X
  8. COMPUTING A DERIVATIVE XGCM DELANDMETER AND VAN SEBILLE, 2019 u,

    v, T u v T & ` ' A ep g b j 1 j 1 j j j + 1 j + 1 j + 2 i 1 i 1 i i i + 1 i + 1 Axis:Y axis:X
  9. COMPUTING A DERIVATIVE XGCM DELANDMETER AND VAN SEBILLE, 2019 u,

    v, T u v T & ` ' A ep g b j 1 j 1 j j j + 1 j + 1 j + 2 i 1 i 1 i i i + 1 i + 1 Axis:Y axis:X
  10. COMPUTING A DERIVATIVE XGCM DELANDMETER AND VAN SEBILLE, 2019 u,

    v, T u v T & ` ' A ep g b j 1 j 1 j j j + 1 j + 1 j + 2 i 1 i 1 i i i + 1 i + 1 Axis:Y axis:X Not too complicated but tedious and error prone
  11. COMPUTING A DERIVATIVE XGCM DELANDMETER AND VAN SEBILLE, 2019 u,

    v, T u v T & ` ' A ep g b j 1 j 1 j j j + 1 j + 1 j + 2 i 1 i 1 i i i + 1 i + 1 Axis:Y axis:X
  12. COMPUTING A DERIVATIVE XGCM Starting from xarray dataset DELANDMETER AND

    VAN SEBILLE, 2019 u, v, T u v T & ` ' A ep g b j 1 j 1 j j j + 1 j + 1 j + 2 i 1 i 1 i i i + 1 i + 1 Axis:Y axis:X
  13. COMPUTING A DERIVATIVE XGCM Starting from xarray dataset Create a

    `grid` object DELANDMETER AND VAN SEBILLE, 2019 u, v, T u v T & ` ' A ep g b j 1 j 1 j j j + 1 j + 1 j + 2 i 1 i 1 i i i + 1 i + 1 Axis:Y axis:X
  14. DELANDMETER AND VAN SEBILLE, 2019 u, v, T u v

    T & ` ' A ep g b j 1 j 1 j j j + 1 j + 1 j + 2 i 1 i 1 i i i + 1 i + 1 Axis:Y axis:X COMPUTING A DERIVATIVE XGCM
  15. DELANDMETER AND VAN SEBILLE, 2019 u, v, T u v

    T & ` ' A ep g b j 1 j 1 j j j + 1 j + 1 j + 2 i 1 i 1 i i i + 1 i + 1 Axis:Y axis:X COMPUTING A DERIVATIVE XGCM Currently supported operations: difference interpolation cumulative sum min/max average integral cumulative integral derivative
  16. SPENDING TIME ON DISCOVERY NOT PREPROCESSING Download Data ⏰ Apply

    Analysis Discovery Regrid Data ⏰ Homogenize Data ⏰ Pangeo’s CMIP6 Google Cloud Public Dataset
  17. SPENDING TIME ON DISCOVERY NOT PREPROCESSING Download Data ⏰ Apply

    Analysis Discovery Regrid Data ⏰ Homogenize Data ⏰ Pangeo’s CMIP6 Google Cloud Public Dataset
  18. SPENDING TIME ON DISCOVERY NOT PREPROCESSING Download Data ⏰ Apply

    Analysis Discovery Regrid Data ⏰ Homogenize Data ⏰ Pangeo’s CMIP6 Google Cloud Public Dataset cmip6_preprocessing
  19. SPENDING TIME ON DISCOVERY NOT PREPROCESSING Download Data ⏰ Apply

    Analysis Discovery Regrid Data ⏰ Homogenize Data ⏰ Pangeo’s CMIP6 Google Cloud Public Dataset cmip6_preprocessing ❤