Slide 1

Slide 1 text

Intro to working with CMIP data the Pangeo way Stanford Ocean Group Demo | Julius Busecke | May 10 2024 Creative, fast, and fun climate science for everyone MΒ²LInES

Slide 2

Slide 2 text

Who am I? Physical Oceanographer Senior Sta ff Associate - Columbia University Manager of Data and Computing - LEAP NSF-STC Lead of Open Science - M2LInES Core Developer of xGCM, xMIP Pangeo Fan, User, and Member Open Source/Open Science Advocate Maintainer of the Pangeo CMIP6 zarr stores

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Observations

Slide 5

Slide 5 text

Observations Models Models

Slide 6

Slide 6 text

6 Credit: NASA's Goddard Space Flight Center

Slide 7

Slide 7 text

6 Credit: NASA's Goddard Space Flight Center

Slide 8

Slide 8 text

6 Credit: NASA's Goddard Space Flight Center https://earthdata.nasa.gov/eosdis/cloud-evolution SWOT NISAR

Slide 9

Slide 9 text

Increasing Resolution and Complexity

Slide 10

Slide 10 text

Climate Science: A global distributed effort

Slide 11

Slide 11 text

I want to create a plot like this!

Slide 12

Slide 12 text

Download Clean and Combine Crunch the data Interpret Results

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

Download Clean and Combine Crunch the data Interpret Results β³πŸ’ΈπŸš«

Slide 15

Slide 15 text

Download Clean and Combine Crunch the data Interpret Results ❌

Slide 16

Slide 16 text

The Traditional Approach Bring the data to the compute Raw Data πŸ’½ πŸ§‘πŸ’»πŸ‘©πŸ’»πŸ§‘πŸ’» πŸ–₯ πŸš€ πŸ’½ πŸ’½ πŸ’½ Large Raw Data 🐌 Cloud O ff ice / Home / Wherever you work! Server/Laptop

Slide 17

Slide 17 text

Our Approach Bring the compute to the data Raw Data πŸ’½ πŸ§‘πŸ’»πŸ‘©πŸ’»πŸ§‘πŸ’» πŸ–₯ πŸ’½ πŸ’½ πŸ’½ Small Data (Figures, Commands) 🐌 Cloud O ff ice / Home / Wherever you work! πŸ–₯ πŸ–₯ πŸ–₯ πŸš€ πŸš€ πŸš€ πŸš€

Slide 18

Slide 18 text

Our Approach Bring the compute to the data Raw Data πŸ’½ πŸ§‘πŸ’»πŸ‘©πŸ’»πŸ§‘πŸ’» πŸ–₯ πŸ’½ πŸ’½ πŸ’½ Cloud O ff ice / Home / Wherever you work! πŸ–₯ πŸ–₯ πŸ–₯ πŸš€ πŸš€ πŸš€ πŸš€ πŸ§‘πŸ’»πŸ‘©πŸ’»πŸ§‘πŸ’» πŸ’¬πŸ’‘πŸ₯ΌπŸ€— Another Lab across the globe

Slide 19

Slide 19 text

CMIP6 Cloud Dataset β€’ Pangeo partnered with ESGF and Google Cloud to provide a new public dataset β€’ > 1 PB and counting β€’ Data stored in Zarr format β€’ Google provides free hosting in GCS https://pangeo-data.github.io/pangeo-cmip6-cloud/

Slide 20

Slide 20 text

Because more and more people need climate data Collaboration on open data: Everybody wins! πŸ‘¨πŸ’Ό πŸ‘¨πŸ”¬ πŸ§‘πŸ’Ό πŸ§‘πŸ’» Science Paper πŸ‘©πŸ”¬πŸŽ‰πŸŽ“ Derived Proprietary Data Product πŸ§‘πŸ’ΌπŸ’΅πŸŽŠ Fixes naming Corrects units Tunes Compression Uploads new data Academia Private Sector / Public Sector πŸ‘· πŸ‘©πŸ« πŸ„ Everyone gets to explore the clean dataset! Public Stakeholder

Slide 21

Slide 21 text

Get involved! https://github.com/leap-stc/cmip6-leap-feedstock Request Datasets that are not yet in the cloud Report issues with existing datasets.

Slide 22

Slide 22 text

E ff icient and fast access to arrays in object store Easy parallelization -> fast iteration on analysis Flexible scienti f ic data structure Crowd Sourced Cleaning + Combining Your favorite code to analyze/interpret 🎁

Slide 23

Slide 23 text

Nobody likes cleaning!

Slide 24

Slide 24 text

Nobody likes cleaning! Di ff erent dimension names in the CMIP data. 
 
 Not quite analysis -ready But easy and fast to f ix!!!

Slide 25

Slide 25 text

Nobody likes cleaning! But its always faster if you split the work! https://github.com/jbusecke/xMIP Check it out and report any new issues with the data!

Slide 26

Slide 26 text

Reproducible IPCC Science in Minutes IPCC Chapter 9 2-10 minutes on LEAP-Pangeo JupyterHub Code Repository https://github.com/jbusecke/presentation_wcrp_open_science_conference Scipy Talk https://www.youtube.com/watch?v=7niNfs3ZpfQ

Slide 27

Slide 27 text

Its not just me! Lower barrier of entry + Quick exploration of ideas = New CMIPers! Teaching with the real data Student ➑ Scientist πŸš€ Not just for academics. Public/private sector uses the cloud data!

Slide 28

Slide 28 text

Agile Science - Speed counts! Idea πŸ’‘ Result βœ…

Slide 29

Slide 29 text

Agile Science - Speed counts! Idea πŸ’‘ Result βœ… Tech/Infrastructure limited Understanding limited

Slide 30

Slide 30 text

Agile Science - Speed counts! Idea πŸ’‘ Result βœ… Tech/Infrastructure limited Understanding limited How to speed up: - Open/Fast Access to data - Community OSS tools - Infrastructure Support from Private Sector?

Slide 31

Slide 31 text

Agile Science - Speed counts! Idea πŸ’‘ Result βœ… Tech/Infrastructure limited Understanding limited How to speed up: - Open/Fast Access to data - Community OSS tools - Infrastructure Support from Private Sector? How to speed up: - Collaboration - More Stakeholders - Reproducibility

Slide 32

Slide 32 text

I ❀ Feedback, questions, contributions. @JuliusBusecke jbusecke [email protected] juliusbusecke.com @JuliusBusecke @[email protected] @codeandcurrents.bsky.social