$30 off During Our Annual Pro Sale. View Details »

Pangeo for Plasma

Tom Nicholas
January 12, 2023

Pangeo for Plasma

Presentation at the BOUT++ workshop 2023 at LLNL, on why the fusion plasma physics analysis community should learn lessons from the success of the Pangeo model in the geosciences community.

Tom Nicholas

January 12, 2023
Tweet

More Decks by Tom Nicholas

Other Decks in Science

Transcript

  1. Pangeo for Plasma
    Thomas Nicholas
    (Columbia University / Lamont-Doherty Earth Observatory)
    [email protected]
    Lessons for plasma software from the climate data
    analytics community

    View Slide

  2. Who am I?

    View Slide

  3. Who am I?
    PhD with Ben Dudson, Fulvio Militello, BOUT++

    View Slide

  4. Who am I?
    PhD with Ben Dudson, Fulvio Militello, BOUT++ RSE with Ryan Abernathey, various projects

    View Slide

  5. What do I do now?

    View Slide

  6. What I hope to convince you of
    ● Our computational infrastructure needs to change a lot
    ● Can use solutions from climate science community
    ● Modular approach makes everyone's work easier
    ● Opportunities exist for plasma coders...

    View Slide

  7. A multi-agency initiative across the federal
    government to spark change and inspire open
    science engagement through events and activities
    that will advance adoption of open science.
    Website: https://open.science.gov/
    WH: https://www.whitehouse.gov/ostp/news-updates/
    Nature: https://doi.org/10.1038/d41586-023-00019-y
    The White House announces
    The Federal Year of Open Science
    NASA ✦ NSF ✦ NOAA ✦ DOA ✦ DOC ✦ DOE ✦ GSA ✦ NEH ✦ NIH ✦ NIST ✦ USDA ✦ USGS
    Along with other organizations, including CENDI group,
    voluntary collaboration among Federal managers, and
    HELIOS, a coalition of 80+ universities

    View Slide

  8. Climate Science == Plasma Physics
    ● Multidimensional (often fluid turbulent)
    ● Large (bigger than local RAM)
    ● On regular but warped grids
    ● Often pulled from central servers
    ● From multiple sources but with common structure (e.g. experimental and simulation
    data for same device).

    View Slide

  9. Climate Science == Plasma Physics
    ● Multidimensional (often fluid turbulent)
    ● Large (bigger than local RAM)
    ● On regular but warped grids
    ● Often pulled from central servers
    ● From multiple sources but with common structure (e.g. experimental and simulation
    data for same device).
    =

    View Slide

  10. Typical scientific workflow

    View Slide

  11. Typical scientific workflow
    step 3: debug
    Because you likely
    rolled-your-own code…

    View Slide

  12. Problem 1: Code not reused

    View Slide

  13. Problem 1: Code not reused
    Modern data science libraries 🚀
    Me as PhD student,
    circa 2017
    MATLAB

    View Slide

  14. Problem 2: Data accessibility

    View Slide

  15. Problem 2: Data accessibility

    View Slide

  16. Problem 2: Data accessibility

    View Slide

  17. Problem 2: Data accessibility

    View Slide

  18. Problem 3: Scale
    “Brb, let me just go download the data to my laptop…”

    View Slide

  19. Problem 3: Scale
    “Brb, let me just go download the data to my laptop…”

    View Slide

  20. Problem 3: Scale
    “Brb, let me just go download the data to my laptop…”

    View Slide

  21. Problem 3: Scale
    “Brb, let me just go download the data to my laptop…”

    View Slide

  22. Problem 3: Scale
    “Brb, let me just go download the data to my laptop…”

    View Slide

  23. Problem 3: Scale

    View Slide

  24. Geoscientists’ solution:

    View Slide

  25. Geoscientists’ solution:
    ✨ ✨

    View Slide

  26. Solution 1: Modular, open ecosystem

    View Slide

  27. Solution 1: Modular, open ecosystem
    Domain-agnostic libraries
    General-purpose tools
    Domain-specific packages
    Science projects

    View Slide

  28. Solution 1: Modular, open ecosystem

    View Slide

  29. Solution 2: Cloud Computing

    View Slide

  30. Solution 2: Cloud Computing

    View Slide

  31. Solution 2: Cloud Computing

    View Slide

  32. Solution 2: Cloud Computing

    View Slide

  33. Solution 2: Cloud Computing

    View Slide

  34. Solution 3: Parallel computing frameworks

    View Slide

  35. Solution 3: Parallel computing frameworks

    View Slide

  36. How might this work for plasma?

    View Slide

  37. How might this work for plasma?
    Domain-agnostic libraries
    General-purpose tools
    Fusion plasma projects

    View Slide

  38. How might this work for plasma?
    Domain-agnostic libraries
    General-purpose tools
    tokamak-specific packages
    Fusion plasma projects

    View Slide

  39. How might this work for plasma?
    -PLASMA

    View Slide

  40. How might this work for plasma?
    Shared plasma metadata conventions
    Tokamak plotting package
    Common analysis tools (e.g. field-line
    tracing)
    Code-specific compatibility wrappers
    Standard data model
    Blog post: https://hackmd.io/@TomNicholas/rkyERwcoO#

    View Slide

  41. Other bonuses of joining this ecosystem
    - Parallel and out-of-core analysis
    - Labelled dimensions
    - Unit-aware arithmetic
    - Easier reproducibility
    - Plotting flexibility
    - Machine Learning integration
    SIX

    View Slide

  42. Summary
    ● Geoscience has same problems as plasma physics 🌍🤝🌞
    ● Being solved using:
    ○ Modular community software ecosystem 🔧
    ○ Cloud computing ⛅
    ○ Parallel execution frameworks 🚀
    ● It's working for them - it could work for us! 🔬

    View Slide

  43. View Slide