Slide 1

Slide 1 text

JULIUS BUSECKE | AGU 2024 | DEC 13 2024 Streaming access to CMIP6 data in the cloud that rocks! PANGEO-ESGF CMIP6 ZARR DATA 2.0

Slide 2

Slide 2 text

Comparing Simulations to observations Future Predictions Emission Scenarios Model Spread WHAT IS CMIP? "THE INTERGOVERNMENTAL PANEL ON CLIMATE CHANGE (IPCC) IS THE UNITED NATIONS BODY FOR ASSESSING THE SCIENCE RELATED TO CLIMATE CHANGE." WWW.IPCC.CH/ 62 ABSTRACTS AT AGU MENTION CMIP EXPLICITLY BUT PROBABLY MANY MORE USE THE DATA!

Slide 3

Slide 3 text

- Many 100.000s of individual datasets - Each dataset is identi fi ed by a unique id, consisting of 'facets' - Facets are part of CMIP controlled vocabulary (https:// wcrp-cmip.github.io/CMIP6_CVs/) - Unfortunately you still need to learn your vocabulary for now WHAT IS CMIP? - ORGANIZATION AND VOCABULARY https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used

Slide 4

Slide 4 text

- Reality: Large institutions create mirrors of parts of the archive, restricted to employees (data fortresses) - Large overhead, requires both expertise, time and funds - Individual access/data cleaning approaches might be incompatible, hindering reusability/reproducibility - E ff ectively limits conducting climate science to large legacy orgs WHAT IS CMIP? - TONS OF DATA! ESGF Custom Code Custom Code Custom Code University Lab Industry ❌ ✋🚫

Slide 5

Slide 5 text

WHAT DID WE DO? https://pangeo-data.github.io/pangeo-cmip6-cloud/

Slide 6

Slide 6 text

WHAT DID WE DO? https://pangeo-data.github.io/pangeo-cmip6-cloud/ Convert to Zarr on Cloud Storage Host NetCDF fi les on cloud storage

Slide 7

Slide 7 text

WHAT DID WE DO? https://pangeo-data.github.io/pangeo-cmip6-cloud/ Convert to Zarr on Cloud Storage

Slide 8

Slide 8 text

WHAT DID WE DO? https://pangeo-data.github.io/pangeo-cmip6-cloud/ Convert to Zarr on Cloud Storage v1: manual ingestion

Slide 9

Slide 9 text

WHAT DID WE DO? https://pangeo-data.github.io/pangeo-cmip6-cloud/ Convert to Zarr on Cloud Storage v1: manual ingestion v2: automated pangeo-forge/beam pipelines

Slide 10

Slide 10 text

WHAT DID WE DO? https://pangeo-data.github.io/pangeo-cmip6-cloud/ Convert to Zarr on Cloud Storage v1: manual ingestion v2: automated pangeo-forge/beam pipelines

Slide 11

Slide 11 text

CMIP6 CLOUD DATA ESGF Ingestion Pipeline A single data repository in the cloud serves all use cases Everybody rolls their own Custom Code Custom Code Custom Code University Lab Industry ❌ ✋🚫 Storage Provided by Google as Public Dataset

Slide 12

Slide 12 text

CMIP6 CLOUD DATA A single data repository in the cloud serves all use cases Collaborative and agile Research Inclusive Education on real climate data Fast Iteration - Lower Barrier of Entry Portable Methods and Results not just for Academia

Slide 13

Slide 13 text

CMIP6 CLOUD DATA A single data repository in the cloud serves all use cases Collaborative and agile Research Inclusive Education on real climate data Portable Methods and Results not just for Academia Fast Iteration - Lower Barrier of Entry

Slide 14

Slide 14 text

CMIP6 CLOUD DATA A single data repository in the cloud serves all use cases Collaborative and agile Research Portable Methods and Results not just for Academia Inclusive Education on real climate data Fast Iteration - Lower Barrier of Entry

Slide 15

Slide 15 text

CMIP6 CLOUD DATA A single data repository in the cloud serves all use cases Portable Methods and Results not just for Academia Collaborative and agile Research Inclusive Education on real climate data Fast Iteration - Lower Barrier of Entry

Slide 16

Slide 16 text

CMIP6 CLOUD DATA A single data repository in the cloud serves all use cases Collaborative and agile Research Inclusive Education on real climate data Fast Iteration - Lower Barrier of Entry Portable Methods and Results not just for Academia

Slide 17

Slide 17 text

HOW CAN I REQUEST NEW DATA? Go to https://github.com/leap-stc/cmip6-leap-feedstock and navigate to the issue page Click "New Issue"

Slide 18

Slide 18 text

HOW CAN I REQUEST NEW DATA? Choose "New File Request" and submit a list unique ids you want ingested!

Slide 19

Slide 19 text

WHAT IS NEXT? - Ongoing Work to bring some of the improved user experience to the new generation of ESGF and CMIP 7 - Using virtualization to avoid doubling demand for storage. - Do we still need to Convert data to native zarr or create performance optimized caches?

Slide 20

Slide 20 text

TL;DR We have a ton of data already in the cloud Explore it today if you like! And let us know about the awesome stu ff you do with it! We ❤ to upload new data Submit a request if your favorite data is missing. And most importantly ...

Slide 21

Slide 21 text

WE NEED A FUTURE WHERE WORKING WITH CMIP7 DATA FEELS LIKE ... ... THIS 🤘 ... AND NOT LIKE THIS https://github.com/zarr-developers/zarr-illustrations-falk-2022 ... for EVERYONE on this planet!

Slide 22

Slide 22 text

WE NEED A FUTURE WHERE WORKING WITH CMIP7 DATA FEELS LIKE ... ... THIS 🤘 ... AND NOT LIKE THIS https://github.com/zarr-developers/zarr-illustrations-falk-2022 ... for EVERYONE on this planet!

Slide 23

Slide 23 text

I ❤ QUESTIONS + FEEDBACK jbusecke juliusbusecke.com @JuliusBusecke @CodeAndCurrents@hachyderm.io @codeandcurrents.bsky.social 🙂↔💀 Screw you, Elon! DEMO: IPCC PLOT FROM SCRATCH IN MINUTES All the links 👉

Slide 24

Slide 24 text

I ❤ QUESTIONS + FEEDBACK jbusecke juliusbusecke.com @JuliusBusecke @CodeAndCurrents@hachyderm.io @codeandcurrents.bsky.social 🙂↔💀 Screw you, Elon! DEMO: IPCC PLOT FROM SCRATCH IN MINUTES All the links 👉