Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open Source Software in Science: Beyond the Code

Fernando Perez
October 17, 2019

Open Source Software in Science: Beyond the Code

A shorter (and updated with some new ideas) version of my recent talk about organizational aspects of scientific open source software development, presented at the Open Science Day at EPFL (Lausanne, Switzerland)

Co-authored with Lindsey Heagy (https://lindseyjh.ca).

Video of the presentation: https://youtu.be/sQBLDURu8-4

Fernando Perez

October 17, 2019
Tweet

More Decks by Fernando Perez

Other Decks in Science

Transcript

  1. Fernando Pérez
    Lindsey Heagy
    Open Source Software in Science:
    Beyond the Code

    View full-size slide

  2. OSS: more than software
    Services and content
    Software
    Standards and Protocols
    Community

    View full-size slide

  3. Content/Services

    View full-size slide

  4. A language agnostic protocol

    View full-size slide

  5. A language agnostic protocol
    u a
    l
    j i

    View full-size slide

  6. A language agnostic protocol
    u a
    l
    j i

    View full-size slide

  7. A language agnostic protocol
    u a
    l
    j i
    ~100 different kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels

    View full-size slide

  8. Community: formalized governance
    Formal fiscal sponsorship
    Brian Granger
    Cal Poly, Amazon
    Me :)

    View full-size slide

  9. More than software, woven into science
    Services and content:
    impact
    Software
    Standards and Protocols:
    ecosystem
    Community:
    innovation & resiliency
    People
    Ideas
    Tools
    Stories

    View full-size slide

  10. OSS supports CORE Science*

    View full-size slide

  11. OSS supports CORE Science*
    Collaborative
    Open
    Reproducible
    Extensible
    * With a nod to the FAIR principles of open data Lindsey Heagy

    View full-size slide

  12. Collaborative?

    View full-size slide

  13. Multiple stakeholders, team effort
    ❖ Academic scientists
    ❖ Educators
    ❖ Industry
    ❖ Government
    ❖ Media/journalism
    ❖ 1500+ community volunteers!

    View full-size slide

  14. Jupyter meets the Earth: newly funded NSF grant - $2M/3y
    ● Climate data analysis
    ● Hydrology
    ● Geophysics
    ● Data discovery
    ● Interactivity
    ● Cloud/HPC infrastructure
    Fernando
    Perez
    Joe Hamman Laurel Larsen Kevin Paul Lindsey Heagy Chris Holdgraf Yuvi Panda
    Research use-cases Tech Developments

    View full-size slide

  15. Dimensions of Openness
    ❖ Open source code
    ❖ Open (FAIR) data
    ❖ Open access publications & artifacts
    ❖ Open standards: interoperability (even with proprietary tools)
    ❖ Open community: all welcome (and mean it!)
    ❖ …

    View full-size slide

  16. Reproducible?
    The foundation of collaboration!

    View full-size slide

  17. mybinder.org: shareable reproducibility
    github.com/freeman-lab
    Explicit Dependencies
    +
    +
    Origins:
    Jeremy Freeman’s lab at
    Janelia farm.
    That “incentives" business…
    !key contributor!
    Tim Head @betatim

    View full-size slide

  18. Black holes! LIGO, Sept 14, 2015
    http://bit.ly/black-holes-woop

    View full-size slide

  19. Black holes! LIGO, Sept 14, 2015
    http://bit.ly/black-holes-woop

    View full-size slide

  20. JupyterLab: a grand unified theory of Jupyter
    Huge Team Effort!
    C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P.
    Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …

    View full-size slide

  21. JupyterLab is extensible: FlyBrainLab
    An Interactive Computing Platform for the Fly Brain
    BIONET Group, Columbia University
    http://www.bionet.ee.columbia.edu
    Aurel A. Lazar (PI)
    Tingkai Liu
    Mehmet K. Turkcan
    Chung-Heng Yeh
    Yiyin Zhou
    http://fruitflybrain.org

    View full-size slide

  22. Teaching with Programmable Notebooks
    Launched in September,
    NOTO (http://
    noto.epfl.ch), EPFL's new
    JupyterLab platform for
    education, allows
    teachers and students to
    create and share
    programmable
    notebooks.
    https://actu.epfl.ch/news/teaching-with-online-programmable-notebooks

    View full-size slide

  23. National infrastructure, from K-12 to HPC
    "
    J. Colliander,
    I. Allison,
    B. Carra

    View full-size slide

  24. Impact: Research and Education

    View full-size slide

  25. Data 8: Foundations of Data Science
    Cathryn Carson
    Ani Adhikari John De Nero
    + Data 100,
    Prob 140,
    Data 102,

    + a large
    team!

    View full-size slide

  26. April 18/19, 2019: Shep Doeleman & Katie Bouman

    View full-size slide

  27. So you want to build Data Science tools
    in academia…

    View full-size slide

  28. Jupyter - funding and resources

    View full-size slide

  29. Contrasts in culture and incentives
    Open Source Academia
    Credit Distributed PI & hierarchy
    Output/artifacts Continuous & Project-specific Discrete papers
    Collaborators Fluid: professionals, volunteers, … Structured, funding-dependent
    Governance/
    decision making
    Open, community based Top-down, PI
    Authorship
    Fluid, roles can evolve, no clear “first/
    senior” author
    Need to say more?
    Peer review Continuous, open, pervasive, friendly The opposite
    Value metric Utility, need, impact “Novel and transformative”

    View full-size slide

  30. Catastrophic Success: an economic problem
    (2015 data) https://arxiv.org/abs/1507.03989

    View full-size slide

  31. Catastrophic Success: an economic problem
    (2015 data) https://arxiv.org/abs/1507.03989
    ❖ MathWorks: 4,000+ employees
    ❖ Wolfram: 800 employees
    ❖ IDL/Harris: 17,000 employees

    View full-size slide

  32. Thank you (Bay Area team)
    Current (Berkeley, LBNL, Bloomberg)
    Stacey Dorton, Lindsey Heagy, Chris Holdgraf, Yuvi
    Panda, Ryan Lovett, Shreyas Cholia, Shane Canon,
    Rollin Thomas, Jason Grout
    Former Berkeley
    Min Ragan-Kelley, Paul Ivanov, Thomas Kluyver, M
    Pacer, Matthias Bussonnier, Jessica Hamrick, Ian
    Rose, Jamie Whitacre.

    View full-size slide

  33. ❖ Economic incentives & sustainability
    ❖ Governance models
    ❖ Roles and professional career paths
    ❖ Multi-stakeholder organizations
    Scientific OSS at scale: complex challenges

    View full-size slide

  34. ❖ Economic incentives & sustainability
    ❖ Governance models
    ❖ Roles and professional career paths
    ❖ Multi-stakeholder organizations
    Scientific OSS at scale: complex challenges
    No scientist is trained for any of this!!
    Thank You!

    View full-size slide