Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open Source Software in Science: Beyond the Code

Open Source Software in Science: Beyond the Code

A shorter (and updated with some new ideas) version of my recent talk about organizational aspects of scientific open source software development, presented at the Open Science Day at EPFL (Lausanne, Switzerland)

Co-authored with Lindsey Heagy (https://lindseyjh.ca).

Video of the presentation: https://youtu.be/sQBLDURu8-4

95198572b00e5fbcd97fb5315215bf7a?s=128

Fernando Perez

October 17, 2019
Tweet

Transcript

  1. Fernando Pérez Lindsey Heagy Open Source Software in Science: Beyond

    the Code
  2. OSS: more than software Services and content Software Standards and

    Protocols Community
  3. Content/Services

  4. A language agnostic protocol

  5. A language agnostic protocol u a l j i

  6. A language agnostic protocol u a l j i

  7. A language agnostic protocol u a l j i ~100

    different kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
  8. Community: formalized governance Formal fiscal sponsorship Brian Granger Cal Poly,

    Amazon Me :)
  9. More than software, woven into science Services and content: impact

    Software Standards and Protocols: ecosystem Community: innovation & resiliency People Ideas Tools Stories
  10. OSS supports CORE Science*

  11. OSS supports CORE Science* Collaborative Open Reproducible Extensible * With

    a nod to the FAIR principles of open data Lindsey Heagy
  12. Collaborative?

  13. Multiple stakeholders, team effort ❖ Academic scientists ❖ Educators ❖

    Industry ❖ Government ❖ Media/journalism ❖ 1500+ community volunteers!
  14. Jupyter meets the Earth: newly funded NSF grant - $2M/3y

    • Climate data analysis • Hydrology • Geophysics • Data discovery • Interactivity • Cloud/HPC infrastructure Fernando Perez Joe Hamman Laurel Larsen Kevin Paul Lindsey Heagy Chris Holdgraf Yuvi Panda Research use-cases Tech Developments
  15. Open?

  16. Dimensions of Openness ❖ Open source code ❖ Open (FAIR)

    data ❖ Open access publications & artifacts ❖ Open standards: interoperability (even with proprietary tools) ❖ Open community: all welcome (and mean it!) ❖ …
  17. Reproducible? The foundation of collaboration!

  18. mybinder.org: shareable reproducibility github.com/freeman-lab Explicit Dependencies + + Origins: Jeremy

    Freeman’s lab at Janelia farm. That “incentives" business… !key contributor! Tim Head @betatim
  19. Black holes! LIGO, Sept 14, 2015 http://bit.ly/black-holes-woop

  20. Black holes! LIGO, Sept 14, 2015 http://bit.ly/black-holes-woop

  21. Extensible?

  22. JupyterLab: a grand unified theory of Jupyter Huge Team Effort!

    C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P. Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …
  23. JupyterLab is extensible: FlyBrainLab An Interactive Computing Platform for the

    Fly Brain BIONET Group, Columbia University http://www.bionet.ee.columbia.edu Aurel A. Lazar (PI) Tingkai Liu Mehmet K. Turkcan Chung-Heng Yeh Yiyin Zhou http://fruitflybrain.org
  24. None
  25. None
  26. Teaching with Programmable Notebooks Launched in September, NOTO (http:// noto.epfl.ch),

    EPFL's new JupyterLab platform for education, allows teachers and students to create and share programmable notebooks. https://actu.epfl.ch/news/teaching-with-online-programmable-notebooks
  27. National infrastructure, from K-12 to HPC " J. Colliander, I.

    Allison, B. Carra
  28. None
  29. Impact: Research and Education

  30. Data 8: Foundations of Data Science Cathryn Carson Ani Adhikari

    John De Nero + Data 100, Prob 140, Data 102, … + a large team!
  31. April 18/19, 2019: Shep Doeleman & Katie Bouman

  32. So you want to build Data Science tools in academia…

  33. Jupyter - funding and resources

  34. None
  35. Contrasts in culture and incentives Open Source Academia Credit Distributed

    PI & hierarchy Output/artifacts Continuous & Project-specific Discrete papers Collaborators Fluid: professionals, volunteers, … Structured, funding-dependent Governance/ decision making Open, community based Top-down, PI Authorship Fluid, roles can evolve, no clear “first/ senior” author Need to say more? Peer review Continuous, open, pervasive, friendly The opposite Value metric Utility, need, impact “Novel and transformative”
  36. Catastrophic Success: an economic problem (2015 data) https://arxiv.org/abs/1507.03989

  37. Catastrophic Success: an economic problem (2015 data) https://arxiv.org/abs/1507.03989 ❖ MathWorks:

    4,000+ employees ❖ Wolfram: 800 employees ❖ IDL/Harris: 17,000 employees
  38. Thank you (Bay Area team) Current (Berkeley, LBNL, Bloomberg) Stacey

    Dorton, Lindsey Heagy, Chris Holdgraf, Yuvi Panda, Ryan Lovett, Shreyas Cholia, Shane Canon, Rollin Thomas, Jason Grout Former Berkeley Min Ragan-Kelley, Paul Ivanov, Thomas Kluyver, M Pacer, Matthias Bussonnier, Jessica Hamrick, Ian Rose, Jamie Whitacre.
  39. ❖ Economic incentives & sustainability ❖ Governance models ❖ Roles

    and professional career paths ❖ Multi-stakeholder organizations Scientific OSS at scale: complex challenges
  40. ❖ Economic incentives & sustainability ❖ Governance models ❖ Roles

    and professional career paths ❖ Multi-stakeholder organizations Scientific OSS at scale: complex challenges No scientist is trained for any of this!! Thank You!