Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open source, academic science and the public mission of research: reflections from the field

Open source, academic science and the public mission of research: reflections from the field

A presentation delivered as part of the CISE Distinguished Lecture Series, at the National Science Foundation headquarters.

Co-author: Lindsey Heagy (https://lindseyjh.ca).

A discussion of Project Jupyter and the role of open source tools in science, along with a reflection on how to tackle the funding and structural challenges of giving these tools a sustainable future.

Video is available here (you need to register for the livestream, even though it's in th past, to access the video player):
http://www.tvworldwide.com/events/nsf/190815

95198572b00e5fbcd97fb5315215bf7a?s=128

Fernando Perez

August 15, 2019
Tweet

Transcript

  1. Fernando Pérez Lindsey Heagy Open source, academic science and the

    public mission of research: reflections from the field
  2. –Hamming'62 “The purpose of computing is insight, not numbers”

  3. What is ❖ Software ❖ Standards and Protocols ❖ Community

  4. Software

  5. JupyterLab: a grand unified theory of Jupyter Huge Team Effort!

    C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P. Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …
  6. Standards and Protocols

  7. Core ideas of the web: HTTP & HTML HTML: format

    to represent content HyperText Markup Language HTTP: protocol to connect clients and servers HyperText Transport Protocol Image credit: eviltester.com
  8. Core ideas of Jupyter Document Format https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods- for-Hackers Interactive Computing

    Protocol SUB SUB DEAL Client SUB DEAL DEAL DEAL ROUT PUB ROUT ROUT Kernel ØMQ + JSON
  9. Jupyter Protocol web-age capture of the process of interactive computing

    any mime-type output
  10. Jupyter Protocol web-age capture of the process of interactive computing

    any mime-type output ❖ text
  11. Jupyter Protocol web-age capture of the process of interactive computing

    any mime-type output ❖ text ❖ svg, png, jpeg
  12. Jupyter Protocol web-age capture of the process of interactive computing

    any mime-type output ❖ text ❖ svg, png, jpeg ❖ latex, pdf
  13. Jupyter Protocol web-age capture of the process of interactive computing

    any mime-type output ❖ text ❖ svg, png, jpeg ❖ latex, pdf ❖ html, javascript
  14. Jupyter Protocol web-age capture of the process of interactive computing

    any mime-type output ❖ text ❖ svg, png, jpeg ❖ latex, pdf ❖ html, javascript ❖ interactive widgets
  15. A language agnostic protocol

  16. A language agnostic protocol u a l j i

  17. A language agnostic protocol u a l j i

  18. A language agnostic protocol u a l j i ~100

    different kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
  19. Community

  20. IPython: an afternoon hack, 2001

  21. Plus ~ 1500 more Open source contributors! A true team

    effort
  22. Formalized governance Formal fiscal sponsorship Brian Granger Cal Poly, Amazon

    Me :)
  23. Educational impact: a view from Berkeley

  24. Fall 2018 Data 100: ~800 students Data 8: ~1,300 students

  25. With these tools, we provide: ❖ Broad disciplinary reach and

    impact of statistical thinking. ❖ Drastically lowered barriers to student access - intellectual and economic. ❖ Lowered barriers for faculty* to engage with statistical and computational ideas. ❖ (*) typically from non computational/statistical domains) Organizational and intellectual leadership: Cathryn Carson, Ani Adhikari, John DeNero, … (many more)
  26. Not just teaching toys: real research tools

  27. Reproducible Research An article about computational science in a scientific

    publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. Buckheit and Donoho, WaveLab and Reproducible Research, 1995
  28. 24 years later… https://twitter.com/KMS_Meltzy/status/1161008572634927104

  29. mybinder.org: shareable reproducibility github.com/freeman-lab Explicit Dependencies + + Origins: Jeremy

    Freeman’s lab at Janelia farm. That “incentives" business…
  30. LIGO: September 14, 2015

  31. None
  32. The song of the universe http://bit.ly/black-holes-woop

  33. The song of the universe http://bit.ly/black-holes-woop

  34. None
  35. April 18/19, 2019: Shep Doeleman & Katie Bouman

  36. LSST: one of the largest line items in NSF budget

    https://docushare.lsst.org/docushare/dsweb/Get/LSE-319
  37. High Energy Physics

  38. HEP & reproducibility A ML-based likelihood-free inference tool for particle

    physics: A new python-based implementation of a statistical tool used for Higgs discovery. Kyle Cranmer, NYU
  39. HEP & reproducibility A ML-based likelihood-free inference tool for particle

    physics: A new python-based implementation of a statistical tool used for Higgs discovery. Kyle Cranmer, NYU
  40. Geosciences: research & education Lindsey Heagy, Berkeley 2019 GWH Career

    Achievement Award for outstanding junior scientist SimPEG: https://simpeg.xyz http://geosci.xyz
  41. interactivity

  42. Pangeo: open geosciences (and more!) Harnessing the power of cloud

    computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey
  43. Pangeo: open geosciences (and more!) Harnessing the power of cloud

    computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey
  44. Pangeo: open geosciences (and more!) Harnessing the power of cloud

    computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey
  45. The “pangeo pattern” https://twitter.com/SurfTasmania/status/1126264352435097601 ❖ Reuse, integrate, document ❖ Engage

    communities on their terms ❖ Contribute back upstream
  46. The Pangeo Principles Robinson, Hamman & Abernathy: https://arxiv.org/abs/1908.03356

  47. None
  48. Jupyter - funding and resources

  49. So you want to build Data Science tools in academia…

  50. None
  51. None
  52. Scientific Open Source: Despite (direct) federal $$ support ❖ Note:

    “indirectly”, lots of $ have supported Scientific OSS projects/ tools. ❖ Under the cover of domain-focused work. ❖ Recently recommended for funding “Jupyter meets the Earth” (Jupyter + Pangeo team) NSF grant (Earth Cube/Shree Mishra) ❖ FP, Laurel Larsen, Lindsey Heagy (Berkeley), Joe Hamman (NCAR) ❖ Thank you!!!
  53. Traditional software infrastructure funding Yes, it’s true, the budget is

    gone again… But you can’t deny that now, we get here in an instant! Quino (Argentinian cartoonist)
  54. Contrasts in culture and incentives Open Source Academia Credit Distributed

    PI & hierarchy Output/artifacts Continuous & Project-specific Discrete papers Collaborators Fluid: professionals, volunteers, … Structured, funding-dependent Governance/ decision making Open, community based Top-down, PI Authorship Fluid, roles can evolve, no clear “first/ senior” author Need to say more? Peer review Continuous, open, pervasive, friendly The opposite Value metric Utility, need, impact “Novel and transformative”
  55. “The Stack”: a complete ecosystem

  56. “The Stack”: a complete ecosystem

  57. “The Stack”: a complete ecosystem Domain-agnostic backbone/trunk

  58. “The Stack”: a complete ecosystem Domain-agnostic backbone/trunk • Not “real

    CS” • Not “real research” • Nobody’s problem • Yet critical to everybody else
  59. What do we do now?

  60. Software in science: I know the NSF cares!! https://diana-hep.org https://iris-hep.org

    http://urssi.us
  61. Organizations that fill current gaps

  62. Skills in education The Carpentries Tracy Teal Executive Director

  63. Skills in education The Carpentries Tracy Teal Executive Director The

    Society of Research Software Engineering was founded on the belief that a world which relies on software must recognise the people who develop it. https://society-rse.org The Society of Research Software Engineering Career paths
  64. Open communities & industry Leah Silen Executive Director Andy Terrel

    President
  65. leadership, management, organization building

  66. JOSS/JOSE: academic publication credit Arfon Smith STScI, Baltimore Lorena Barba

    GWU
  67. JOSS review checklist: a social hack

  68. JOSS experience is positive, and yet…

  69. JOSS experience is positive, and yet… Yet! Not indexed by

    Google Scholar https://github.com/openjournals/joss/issues/130
  70. HackWeeks: training and hacking meet domain research Training?
 From undergrads

    to senior PIs In the same room!!
  71. HackWeeks: training and hacking meet domain research Training?
 From undergrads

    to senior PIs In the same room!!
  72. HackWeeks: training and hacking meet domain research Training?
 From undergrads

    to senior PIs In the same room!!
  73. HackWeeks: training and hacking meet domain research Training?
 From undergrads

    to senior PIs In the same room!!
  74. HackWeeks: training and hacking meet domain research Training?
 From undergrads

    to senior PIs In the same room!!
  75. HackWeeks - a reproducible model https://doi.org/10.1073/pnas.1717196115 Daniela Huppenkothen http://huppenkothen.org/

  76. Bang for the buck? ❖ Federal 2018 R&D budget: $176.8B

    (AAAS analysis) ❖ What fraction of R&D today depends critically on computing? 10%? 30%? 50%? ❖ $200M is ~0.1% of that. ❖ $200M annually (well spent) would have major impact.
  77. “Well spent” That should be easy… ❖ Some features of

    successful, resilient projects ❖ Broad community engagement ❖ Actively managed pipeline for new contributions ❖ Capacity for short and long-term planning ❖ Writing code only small part of the job ❖ Treat OSS projects like organizations
  78. Strategic vision: requires professionalization ❖ Full-time work ❖ R&D, operations,

    community, fundraising ❖ Professionalization is inclusive: ❖ reliance on volunteers excludes those who can’t afford to volunteer.
  79. Multi-stakeholder governance @nayafia

  80. You don’t get fruit without a tree…