Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pangeo for Plasma

Tom Nicholas
January 12, 2023

Pangeo for Plasma

Presentation at the BOUT++ workshop 2023 at LLNL, on why the fusion plasma physics analysis community should learn lessons from the success of the Pangeo model in the geosciences community.

Tom Nicholas

January 12, 2023
Tweet

More Decks by Tom Nicholas

Other Decks in Science

Transcript

  1. Pangeo for Plasma Thomas Nicholas (Columbia University / Lamont-Doherty Earth

    Observatory) [email protected] Lessons for plasma software from the climate data analytics community
  2. Who am I?

  3. Who am I? PhD with Ben Dudson, Fulvio Militello, BOUT++

  4. Who am I? PhD with Ben Dudson, Fulvio Militello, BOUT++

    RSE with Ryan Abernathey, various projects
  5. What do I do now?

  6. What I hope to convince you of • Our computational

    infrastructure needs to change a lot • Can use solutions from climate science community • Modular approach makes everyone's work easier • Opportunities exist for plasma coders...
  7. A multi-agency initiative across the federal government to spark change

    and inspire open science engagement through events and activities that will advance adoption of open science. Website: https://open.science.gov/ WH: https://www.whitehouse.gov/ostp/news-updates/ Nature: https://doi.org/10.1038/d41586-023-00019-y The White House announces The Federal Year of Open Science NASA ✦ NSF ✦ NOAA ✦ DOA ✦ DOC ✦ DOE ✦ GSA ✦ NEH ✦ NIH ✦ NIST ✦ USDA ✦ USGS Along with other organizations, including CENDI group, voluntary collaboration among Federal managers, and HELIOS, a coalition of 80+ universities
  8. Climate Science == Plasma Physics • Multidimensional (often fluid turbulent)

    • Large (bigger than local RAM) • On regular but warped grids • Often pulled from central servers • From multiple sources but with common structure (e.g. experimental and simulation data for same device).
  9. Climate Science == Plasma Physics • Multidimensional (often fluid turbulent)

    • Large (bigger than local RAM) • On regular but warped grids • Often pulled from central servers • From multiple sources but with common structure (e.g. experimental and simulation data for same device). =
  10. Typical scientific workflow

  11. Typical scientific workflow step 3: debug Because you likely rolled-your-own

    code…
  12. Problem 1: Code not reused

  13. Problem 1: Code not reused Modern data science libraries 🚀

    Me as PhD student, circa 2017 MATLAB
  14. Problem 2: Data accessibility

  15. Problem 2: Data accessibility

  16. Problem 2: Data accessibility

  17. Problem 2: Data accessibility

  18. Problem 3: Scale “Brb, let me just go download the

    data to my laptop…”
  19. Problem 3: Scale “Brb, let me just go download the

    data to my laptop…”
  20. Problem 3: Scale “Brb, let me just go download the

    data to my laptop…”
  21. Problem 3: Scale “Brb, let me just go download the

    data to my laptop…”
  22. Problem 3: Scale “Brb, let me just go download the

    data to my laptop…”
  23. Problem 3: Scale

  24. Geoscientists’ solution:

  25. Geoscientists’ solution: ✨ ✨

  26. Solution 1: Modular, open ecosystem

  27. Solution 1: Modular, open ecosystem Domain-agnostic libraries General-purpose tools Domain-specific

    packages Science projects
  28. Solution 1: Modular, open ecosystem

  29. Solution 2: Cloud Computing

  30. Solution 2: Cloud Computing

  31. Solution 2: Cloud Computing

  32. Solution 2: Cloud Computing

  33. Solution 2: Cloud Computing

  34. Solution 3: Parallel computing frameworks

  35. Solution 3: Parallel computing frameworks

  36. How might this work for plasma?

  37. How might this work for plasma? Domain-agnostic libraries General-purpose tools

    Fusion plasma projects
  38. How might this work for plasma? Domain-agnostic libraries General-purpose tools

    tokamak-specific packages Fusion plasma projects
  39. How might this work for plasma? -PLASMA

  40. How might this work for plasma? Shared plasma metadata conventions

    Tokamak plotting package Common analysis tools (e.g. field-line tracing) Code-specific compatibility wrappers Standard data model Blog post: https://hackmd.io/@TomNicholas/rkyERwcoO#
  41. Other bonuses of joining this ecosystem - Parallel and out-of-core

    analysis - Labelled dimensions - Unit-aware arithmetic - Easier reproducibility - Plotting flexibility - Machine Learning integration SIX
  42. Summary • Geoscience has same problems as plasma physics 🌍🤝🌞

    • Being solved using: ◦ Modular community software ecosystem 🔧 ◦ Cloud computing ⛅ ◦ Parallel execution frameworks 🚀 • It's working for them - it could work for us! 🔬
  43. None