Slide 1

Slide 1 text

JULIUS BUSECKE* SEP 24 CMIP6 IN THE CLOUD: A PROTOTYPE TO BREAK BARRIERS AND ACCELERATE COLLABORATION *THANKS TO MY COLLABORATORS, ESPECIALLY CHARLES STERN!

Slide 2

Slide 2 text

WHO AM I? M²LInES jbusecke juliusbusecke.com @JuliusBusecke @[email protected] @codeandcurrents.bsky.social 🌊 Climate Scientist Ocean transport of Heat, Carbon Oxygen Impact of small scale processes on global climate variability. 🤓Developer/Data Nerd Pangeo CMIP6 Cloud Data xMIP/xGCM 🤝 Open Science Advocate Manager for Data and Computation - NSF- LEAP Lead of Open Research - m2lines

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

https://edition.cnn.com/ https://www.wsj.com

Slide 5

Slide 5 text

By running large numerical simulations of the planet! HOW DO WE KNOW WHAT WILL HAPPEN?

Slide 6

Slide 6 text

"THE INTERGOVERNMENTAL PANEL ON CLIMATE CHANGE (IPCC) IS THE UNITED NATIONS BODY FOR ASSESSING THE SCIENCE RELATED TO CLIMATE CHANGE." WWW.IPCC.CH/

Slide 7

Slide 7 text

"THE INTERGOVERNMENTAL PANEL ON CLIMATE CHANGE (IPCC) IS THE UNITED NATIONS BODY FOR ASSESSING THE SCIENCE RELATED TO CLIMATE CHANGE." WWW.IPCC.CH/

Slide 8

Slide 8 text

"THE INTERGOVERNMENTAL PANEL ON CLIMATE CHANGE (IPCC) IS THE UNITED NATIONS BODY FOR ASSESSING THE SCIENCE RELATED TO CLIMATE CHANGE." WWW.IPCC.CH/ Comparing Simulations to observations

Slide 9

Slide 9 text

"THE INTERGOVERNMENTAL PANEL ON CLIMATE CHANGE (IPCC) IS THE UNITED NATIONS BODY FOR ASSESSING THE SCIENCE RELATED TO CLIMATE CHANGE." WWW.IPCC.CH/ Comparing Simulations to observations Future Predictions

Slide 10

Slide 10 text

"THE INTERGOVERNMENTAL PANEL ON CLIMATE CHANGE (IPCC) IS THE UNITED NATIONS BODY FOR ASSESSING THE SCIENCE RELATED TO CLIMATE CHANGE." WWW.IPCC.CH/ Comparing Simulations to observations Future Predictions Emission Scenarios

Slide 11

Slide 11 text

"THE INTERGOVERNMENTAL PANEL ON CLIMATE CHANGE (IPCC) IS THE UNITED NATIONS BODY FOR ASSESSING THE SCIENCE RELATED TO CLIMATE CHANGE." WWW.IPCC.CH/ Comparing Simulations to observations Future Predictions Emission Scenarios Model Spread (Uncertainty)

Slide 12

Slide 12 text

COUPLED MODEL INTERCOMPARISON PROJECT The objective of CMIP is to better understand past, present and future climate changes arising from natural, unforced variability or in response to changes in radiative forcing in a multi-model context. [...] 1000s of individual simulations - 20PB data More use-cases and opportunities to explore

Slide 13

Slide 13 text

- Many 100.000s of individual datasets - Each dataset is identi fi ed by a unique id, consisting of 'facets' - Facets are part of CMIP controlled vocabulary (https://wcrp-cmip.github.io/CMIP6_CVs/) CMIP DATA https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used

Slide 14

Slide 14 text

- Many 100.000s of individual datasets - Each dataset is identi fi ed by a unique id, consisting of 'facets' - Facets are part of CMIP controlled vocabulary (https://wcrp- cmip.github.io/CMIP6_CVs/) - Variable names are standardized using the CMOR (https:// cmor.llnl.gov/) library. CMIP DATA https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used https://wcrp-cmip.org/map/#map_of_modelling_centres_and_esgf_nodes

Slide 15

Slide 15 text

- Many 100.000s of individual datasets - Each dataset is identi fi ed by a unique id, consisting of 'facets' - Facets are part of CMIP controlled vocabulary (https://wcrp- cmip.github.io/CMIP6_CVs/) - Variable names are standardized using the CMOR (https:// cmor.llnl.gov/) library. CMIP DATA https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used https://wcrp-cmip.org/map/#map_of_modelling_centres_and_esgf_nodes

Slide 16

Slide 16 text

CMIP DATA https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used

Slide 17

Slide 17 text

CMIP DATA https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used Hingray and Saïd 2014

Slide 18

Slide 18 text

CMIP DATA https://expearth.uib.no/?page_id=28 CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used - Variable names are standardized using the CMOR (https://cmor.llnl.gov/) library.

Slide 19

Slide 19 text

- Data is distributed via the Earth System Grid Federation (ESGF) - Federation of public sector data centers hosting CMIP data (and more) on their servers. - Licensed, public, free - Data nodes serve netcdf fi les primarily to download. ESGF

Slide 20

Slide 20 text

- Reality: Large institutions create mirrors of parts of the archive, restricted to employees (data fortresses) - Large overhead, requires both expertise, time and funds - Individual access/data cleaning approaches might be incompatible, hindering reusability/reproducibility - E ff ectively limits conducting climate science to large legacy orgs CMIP DATA: CHALLENGES ESGF Custom Code Custom Code Custom Code University Lab Industry ❌ ✋🚫

Slide 21

Slide 21 text

ANALYSIS-READY CLOUD-OPTIMIZED (ARCO) DATA

Slide 22

Slide 22 text

ANALYSIS-READY CLOUD-OPTIMIZED (ARCO) DATA Analysis-Ready: • Think in “Datasets/ Datacubes” not “ fi les” and "folders" • Rich Metadata Cloud Optimized: Chunked appropriately for analysis Rich metadata Everything in one dataset object

Slide 23

Slide 23 text

ANALYSIS-READY CLOUD-OPTIMIZED (ARCO) DATA Analysis-Ready: • Think in “Datasets/ Datacubes” not “ fi les” and "folders" • Rich Metadata Cloud Optimized: • E ffi cient cloud native access • Integration with data science and ML ecosystem

Slide 24

Slide 24 text

CMIP6 CLOUD DATA ESGF Ingestion Pipeline A single data repository in the cloud serves all use cases Everybody rolls their own Custom Code Custom Code Custom Code University Lab Industry ❌ ✋🚫 Storage Provided by Google as Public Dataset

Slide 25

Slide 25 text

CMIP6 CLOUD DATA A single data repository in the cloud serves all use cases Collaborative and agile Research Inclusive Education on real climate data Fast Iteration - Lower Barrier of Entry Portable Methods and Results not just for Academia

Slide 26

Slide 26 text

CMIP6 CLOUD DATA A single data repository in the cloud serves all use cases Collaborative and agile Research Inclusive Education on real climate data Portable Methods and Results not just for Academia Fast Iteration - Lower Barrier of Entry

Slide 27

Slide 27 text

CMIP6 CLOUD DATA A single data repository in the cloud serves all use cases Collaborative and agile Research Portable Methods and Results not just for Academia Inclusive Education on real climate data Fast Iteration - Lower Barrier of Entry

Slide 28

Slide 28 text

CMIP6 CLOUD DATA A single data repository in the cloud serves all use cases Portable Methods and Results not just for Academia Collaborative and agile Research Inclusive Education on real climate data Fast Iteration - Lower Barrier of Entry

Slide 29

Slide 29 text

CMIP6 CLOUD DATA A single data repository in the cloud serves all use cases Collaborative and agile Research Inclusive Education on real climate data Fast Iteration - Lower Barrier of Entry Portable Methods and Results not just for Academia

Slide 30

Slide 30 text

DEMO

Slide 31

Slide 31 text

DEMO

Slide 32

Slide 32 text

DEMO

Slide 33

Slide 33 text

MORE INFO? I ❤ QUESTIONS jbusecke juliusbusecke.com @JuliusBusecke @[email protected] @codeandcurrents.bsky.social https://github.com/leap-stc/cmip6-leap-feedstock https://pangeo-data.github.io/pangeo-cmip6-cloud/