Scientific Open Source Software: meat and bits but not papers. Is it real work?

95198572b00e5fbcd97fb5315215bf7a?s=47 Fernando Perez
October 08, 2019

Scientific Open Source Software: meat and bits but not papers. Is it real work?

A discussion on the role of open source software in science, its sustainability and current outlook.

Co-authored with Lindsey Heagy (https://lindseyjh.ca).

Video of the presentation available at: https://cdac.uchicago.edu/insights/fernando-perez-scientific-open-source-software

95198572b00e5fbcd97fb5315215bf7a?s=128

Fernando Perez

October 08, 2019
Tweet

Transcript

  1. 4.
  2. 8.

    JupyterLab: a grand unified theory of Jupyter Huge Team Effort!

    C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P. Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …
  3. 10.

    Core ideas of the web: HTTP & HTML HTML: format

    to represent content HyperText Markup Language HTTP: protocol to connect clients and servers HyperText Transport Protocol Image credit: eviltester.com
  4. 15.

    A language agnostic protocol u a l j i ~100

    different kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
  5. 16.
  6. 22.

    With these tools, we provide: ❖ Broad disciplinary reach and

    impact of statistical thinking. ❖ Drastically lowered barriers to student access - intellectual and economic. ❖ Lowered barriers for faculty* to engage with statistical and computational ideas. ❖ (*) typically from non computational/statistical domains) Organizational and intellectual leadership: Cathryn Carson, Ani Adhikari, John DeNero, … (many more)
  7. 24.

    Reproducible Research An article about computational science in a scientific

    publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. Buckheit and Donoho, WaveLab and Reproducible Research, 1995
  8. 25.
  9. 29.
  10. 31.

    Geosciences: research & education Lindsey Heagy, Berkeley 2019 GWH Career

    Achievement Award for outstanding junior scientist SimPEG: https://simpeg.xyz http://geosci.xyz
  11. 32.

    Pangeo: open geosciences (and more!) Harnessing the power of cloud

    computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey Joe Hamman
  12. 33.

    Pangeo: open geosciences (and more!) Harnessing the power of cloud

    computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey Joe Hamman
  13. 34.

    Pangeo: open geosciences (and more!) Harnessing the power of cloud

    computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey Joe Hamman
  14. 35.

    Jupyter meets the Earth: newly funded NSF grant - $2M/3y

    • CMIP6 Climate data analysis • Large scale hydrological modelling • Geophysical simulations and inversions • Data discovery through JupyterLab • Interactivity: Widgets & Dashboards • JupyterHub: Using and managing shared computational infrastructure Fernando Perez Joe Hamman Laurel Larsen Kevin Paul Lindsey Heagy Chris Holdgraf Yuvi Panda Research use-cases Tech developments
  15. 36.
  16. 42.
  17. 43.

    Scientific Open Source: Despite (direct) federal $$ support ❖ “Indirectly”,

    lots of $ have supported Scientific OSS projects/tools. ❖ Under the cover of domain-focused work.
  18. 44.

    Traditional software infrastructure funding Yes, it’s true, the budget is

    gone again… But you can’t deny that now, we get here in an instant! Quino (Argentinian cartoonist)
  19. 45.

    Contrasts in culture and incentives Open Source Academia Credit Distributed

    PI & hierarchy Output/artifacts Continuous & Project-specific Discrete papers Collaborators Fluid: professionals, volunteers, … Structured, funding-dependent Governance/ decision making Open, community based Top-down, PI Authorship Fluid, roles can evolve, no clear “first/ senior” author Need to say more? Peer review Continuous, open, pervasive, friendly The opposite Value metric Utility, need, impact “Novel and transformative”
  20. 49.

    “The Stack”: a complete ecosystem Domain-agnostic backbone/trunk • Not “real

    CS” • Not “real research” • Nobody’s problem • Yet critical to everybody else
  21. 52.

    Skills in education The Carpentries Tracy Teal Executive Director The

    Society of Research Software Engineering was founded on the belief that a world which relies on software must recognise the people who develop it. https://society-rse.org The Society of Research Software Engineering Career paths
  22. 64.

    Catastrophic Success: an economic problem (2015 data) https://arxiv.org/abs/1507.03989 ❖ MathWorks:

    4,000+ employees ❖ Wolfram: 800 employees ❖ IDL/Harris: 17,000 employees
  23. 65.

    Investing to hedge strategic risks ❖ It takes investment to

    have a seat at the table. ❖ Scientists (and their funders) want a voice? ❖ The code is already out - whose voices will shape it?
  24. 66.

    Bang for the buck? ❖ Federal 2018 R&D budget: $176.8B

    (AAAS analysis) ❖ What fraction of R&D today depends critically on computing? 10%? 30%? 50%? ❖ $200M is ~0.1% of that. ❖ $200M annually (well spent) would have major impact.
  25. 67.

    “Well spent” That should be easy… ❖ Some features of

    successful, resilient projects ❖ Broad community engagement ❖ Actively managed pipeline for new contributions ❖ Capacity for short and long-term planning ❖ Writing code only small part of the job ❖ Treat OSS projects like real, complex organizations
  26. 68.

    It’s in the air… "many projects of immense infrastructural importance

    are simultaneously fundamental to multiple business models and also chronically underfunded”
  27. 71.

    ❖ Economic incentives and sustainability ❖ Governance models ❖ Roles

    and professional career paths ❖ Multi-stakeholder organizational structures OSS is a lot more than software
  28. 72.

    Thank you (Bay Area team) Current (Berkeley, LBNL, Bloomberg) Stacey

    Dorton, Lindsey Heagy, Chris Holdgraf, Yuvi Panda, Ryan Lovett, Shreyas Cholia, Shane Canon, Rollin Thomas, Jason Grout Former Berkeley Min Ragan-Kelley, Paul Ivanov, Thomas Kluyver, M Pacer, Matthias Bussonnier, Jessica Hamrick, Ian Rose, Jamie Whitacre.