Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CMIP6 Demo Deck

CMIP6 Demo Deck

Julius Busecke

May 10, 2024
Tweet

More Decks by Julius Busecke

Other Decks in Science

Transcript

  1. Intro to working with CMIP data the Pangeo way Stanford

    Ocean Group Demo | Julius Busecke | May 10 2024 Creative, fast, and fun climate science for everyone M²LInES
  2. Who am I? Physical Oceanographer Senior Sta ff Associate -

    Columbia University Manager of Data and Computing - LEAP NSF-STC Lead of Open Science - M2LInES Core Developer of xGCM, xMIP Pangeo Fan, User, and Member Open Source/Open Science Advocate Maintainer of the Pangeo CMIP6 zarr stores
  3. The Traditional Approach Bring the data to the compute Raw

    Data 💽 🧑💻👩💻🧑💻 🖥 🚀 💽 💽 💽 Large Raw Data 🐌 Cloud O ff ice / Home / Wherever you work! Server/Laptop
  4. Our Approach Bring the compute to the data Raw Data

    💽 🧑💻👩💻🧑💻 🖥 💽 💽 💽 Small Data (Figures, Commands) 🐌 Cloud O ff ice / Home / Wherever you work! 🖥 🖥 🖥 🚀 🚀 🚀 🚀
  5. Our Approach Bring the compute to the data Raw Data

    💽 🧑💻👩💻🧑💻 🖥 💽 💽 💽 Cloud O ff ice / Home / Wherever you work! 🖥 🖥 🖥 🚀 🚀 🚀 🚀 🧑💻👩💻🧑💻 💬💡🥼🤗 Another Lab across the globe
  6. CMIP6 Cloud Dataset • Pangeo partnered with ESGF and Google

    Cloud to provide a new public dataset • > 1 PB and counting • Data stored in Zarr format • Google provides free hosting in GCS https://pangeo-data.github.io/pangeo-cmip6-cloud/
  7. Because more and more people need climate data Collaboration on

    open data: Everybody wins! 👨💼 👨🔬 🧑💼 🧑💻 Science Paper 👩🔬🎉🎓 Derived Proprietary Data Product 🧑💼💵🎊 Fixes naming Corrects units Tunes Compression Uploads new data Academia Private Sector / Public Sector 👷 👩🏫 🏄 Everyone gets to explore the clean dataset! Public Stakeholder
  8. E ff icient and fast access to arrays in object

    store Easy parallelization -> fast iteration on analysis Flexible scienti f ic data structure Crowd Sourced Cleaning + Combining Your favorite code to analyze/interpret 🎁
  9. Nobody likes cleaning! Di ff erent dimension names in the

    CMIP data. 
 
 Not quite analysis -ready But easy and fast to f ix!!!
  10. Nobody likes cleaning! But its always faster if you split

    the work! https://github.com/jbusecke/xMIP Check it out and report any new issues with the data!
  11. Reproducible IPCC Science in Minutes IPCC Chapter 9 2-10 minutes

    on LEAP-Pangeo JupyterHub Code Repository https://github.com/jbusecke/presentation_wcrp_open_science_conference Scipy Talk https://www.youtube.com/watch?v=7niNfs3ZpfQ
  12. Its not just me! Lower barrier of entry + Quick

    exploration of ideas = New CMIPers! Teaching with the real data Student ➡ Scientist 🚀 Not just for academics. Public/private sector uses the cloud data!
  13. Agile Science - Speed counts! Idea 💡 Result ✅ Tech/Infrastructure

    limited Understanding limited How to speed up: - Open/Fast Access to data - Community OSS tools - Infrastructure Support from Private Sector?
  14. Agile Science - Speed counts! Idea 💡 Result ✅ Tech/Infrastructure

    limited Understanding limited How to speed up: - Open/Fast Access to data - Community OSS tools - Infrastructure Support from Private Sector? How to speed up: - Collaboration - More Stakeholders - Reproducibility