Open source, academic science and the public mission of research: reflections from the field

Open source, academic science and the public mission of research: reflections from the field

A presentation delivered as part of the CISE Distinguished Lecture Series, at the National Science Foundation headquarters.

Co-author: Lindsey Heagy (https://lindseyjh.ca).

A discussion of Project Jupyter and the role of open source tools in science, along with a reflection on how to tackle the funding and structural challenges of giving these tools a sustainable future.

Video is available here (you need to register for the livestream, even though it's in th past, to access the video player):
http://www.tvworldwide.com/events/nsf/190815

95198572b00e5fbcd97fb5315215bf7a?s=128

Fernando Perez

August 15, 2019
Tweet

Transcript

  1. 1.

    Fernando Pérez Lindsey Heagy Open source, academic science and the

    public mission of research: reflections from the field
  2. 5.

    JupyterLab: a grand unified theory of Jupyter Huge Team Effort!

    C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P. Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …
  3. 7.

    Core ideas of the web: HTTP & HTML HTML: format

    to represent content HyperText Markup Language HTTP: protocol to connect clients and servers HyperText Transport Protocol Image credit: eviltester.com
  4. 11.

    Jupyter Protocol web-age capture of the process of interactive computing

    any mime-type output ❖ text ❖ svg, png, jpeg
  5. 12.

    Jupyter Protocol web-age capture of the process of interactive computing

    any mime-type output ❖ text ❖ svg, png, jpeg ❖ latex, pdf
  6. 13.

    Jupyter Protocol web-age capture of the process of interactive computing

    any mime-type output ❖ text ❖ svg, png, jpeg ❖ latex, pdf ❖ html, javascript
  7. 14.

    Jupyter Protocol web-age capture of the process of interactive computing

    any mime-type output ❖ text ❖ svg, png, jpeg ❖ latex, pdf ❖ html, javascript ❖ interactive widgets
  8. 18.

    A language agnostic protocol u a l j i ~100

    different kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
  9. 19.
  10. 25.

    With these tools, we provide: ❖ Broad disciplinary reach and

    impact of statistical thinking. ❖ Drastically lowered barriers to student access - intellectual and economic. ❖ Lowered barriers for faculty* to engage with statistical and computational ideas. ❖ (*) typically from non computational/statistical domains) Organizational and intellectual leadership: Cathryn Carson, Ani Adhikari, John DeNero, … (many more)
  11. 27.

    Reproducible Research An article about computational science in a scientific

    publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. Buckheit and Donoho, WaveLab and Reproducible Research, 1995
  12. 29.
  13. 31.
  14. 34.
  15. 36.

    LSST: one of the largest line items in NSF budget

    https://docushare.lsst.org/docushare/dsweb/Get/LSE-319
  16. 38.

    HEP & reproducibility A ML-based likelihood-free inference tool for particle

    physics: A new python-based implementation of a statistical tool used for Higgs discovery. Kyle Cranmer, NYU
  17. 39.

    HEP & reproducibility A ML-based likelihood-free inference tool for particle

    physics: A new python-based implementation of a statistical tool used for Higgs discovery. Kyle Cranmer, NYU
  18. 40.

    Geosciences: research & education Lindsey Heagy, Berkeley 2019 GWH Career

    Achievement Award for outstanding junior scientist SimPEG: https://simpeg.xyz http://geosci.xyz
  19. 42.

    Pangeo: open geosciences (and more!) Harnessing the power of cloud

    computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey
  20. 43.

    Pangeo: open geosciences (and more!) Harnessing the power of cloud

    computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey
  21. 44.

    Pangeo: open geosciences (and more!) Harnessing the power of cloud

    computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey
  22. 47.
  23. 50.
  24. 51.
  25. 52.

    Scientific Open Source: Despite (direct) federal $$ support ❖ Note:

    “indirectly”, lots of $ have supported Scientific OSS projects/ tools. ❖ Under the cover of domain-focused work. ❖ Recently recommended for funding “Jupyter meets the Earth” (Jupyter + Pangeo team) NSF grant (Earth Cube/Shree Mishra) ❖ FP, Laurel Larsen, Lindsey Heagy (Berkeley), Joe Hamman (NCAR) ❖ Thank you!!!
  26. 53.

    Traditional software infrastructure funding Yes, it’s true, the budget is

    gone again… But you can’t deny that now, we get here in an instant! Quino (Argentinian cartoonist)
  27. 54.

    Contrasts in culture and incentives Open Source Academia Credit Distributed

    PI & hierarchy Output/artifacts Continuous & Project-specific Discrete papers Collaborators Fluid: professionals, volunteers, … Structured, funding-dependent Governance/ decision making Open, community based Top-down, PI Authorship Fluid, roles can evolve, no clear “first/ senior” author Need to say more? Peer review Continuous, open, pervasive, friendly The opposite Value metric Utility, need, impact “Novel and transformative”
  28. 58.

    “The Stack”: a complete ecosystem Domain-agnostic backbone/trunk • Not “real

    CS” • Not “real research” • Nobody’s problem • Yet critical to everybody else
  29. 63.

    Skills in education The Carpentries Tracy Teal Executive Director The

    Society of Research Software Engineering was founded on the belief that a world which relies on software must recognise the people who develop it. https://society-rse.org The Society of Research Software Engineering Career paths
  30. 69.

    JOSS experience is positive, and yet… Yet! Not indexed by

    Google Scholar https://github.com/openjournals/joss/issues/130
  31. 76.

    Bang for the buck? ❖ Federal 2018 R&D budget: $176.8B

    (AAAS analysis) ❖ What fraction of R&D today depends critically on computing? 10%? 30%? 50%? ❖ $200M is ~0.1% of that. ❖ $200M annually (well spent) would have major impact.
  32. 77.

    “Well spent” That should be easy… ❖ Some features of

    successful, resilient projects ❖ Broad community engagement ❖ Actively managed pipeline for new contributions ❖ Capacity for short and long-term planning ❖ Writing code only small part of the job ❖ Treat OSS projects like organizations
  33. 78.

    Strategic vision: requires professionalization ❖ Full-time work ❖ R&D, operations,

    community, fundraising ❖ Professionalization is inclusive: ❖ reliance on volunteers excludes those who can’t afford to volunteer.