Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open source, academic science and the public mission of research: reflections from the field

Open source, academic science and the public mission of research: reflections from the field

A presentation delivered as part of the CISE Distinguished Lecture Series, at the National Science Foundation headquarters.

Co-author: Lindsey Heagy (https://lindseyjh.ca).

A discussion of Project Jupyter and the role of open source tools in science, along with a reflection on how to tackle the funding and structural challenges of giving these tools a sustainable future.

Video is available here (you need to register for the livestream, even though it's in th past, to access the video player):
http://www.tvworldwide.com/events/nsf/190815

Fernando Perez

August 15, 2019
Tweet

More Decks by Fernando Perez

Other Decks in Science

Transcript

  1. Fernando Pérez
    Lindsey Heagy
    Open source, academic science and the public
    mission of research: reflections from the field

    View full-size slide

  2. –Hamming'62
    “The purpose of computing is insight,
    not numbers”

    View full-size slide

  3. What is
    ❖ Software
    ❖ Standards and Protocols
    ❖ Community

    View full-size slide

  4. JupyterLab: a grand unified theory of Jupyter
    Huge Team Effort!
    C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P.
    Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …

    View full-size slide

  5. Standards and Protocols

    View full-size slide

  6. Core ideas of the web: HTTP & HTML
    HTML: format to represent content
    HyperText Markup Language
    HTTP: protocol to connect clients and servers
    HyperText Transport Protocol
    Image credit: eviltester.com

    View full-size slide

  7. Core ideas of Jupyter
    Document Format
    https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-
    for-Hackers
    Interactive Computing Protocol
    SUB SUB DEAL
    Client
    SUB
    DEAL
    DEAL
    DEAL
    ROUT
    PUB ROUT
    ROUT
    Kernel
    ØMQ + JSON

    View full-size slide

  8. Jupyter Protocol
    web-age capture of the process of interactive computing
    any mime-type output

    View full-size slide

  9. Jupyter Protocol
    web-age capture of the process of interactive computing
    any mime-type output
    ❖ text

    View full-size slide

  10. Jupyter Protocol
    web-age capture of the process of interactive computing
    any mime-type output
    ❖ text
    ❖ svg, png, jpeg

    View full-size slide

  11. Jupyter Protocol
    web-age capture of the process of interactive computing
    any mime-type output
    ❖ text
    ❖ svg, png, jpeg
    ❖ latex, pdf

    View full-size slide

  12. Jupyter Protocol
    web-age capture of the process of interactive computing
    any mime-type output
    ❖ text
    ❖ svg, png, jpeg
    ❖ latex, pdf
    ❖ html, javascript

    View full-size slide

  13. Jupyter Protocol
    web-age capture of the process of interactive computing
    any mime-type output
    ❖ text
    ❖ svg, png, jpeg
    ❖ latex, pdf
    ❖ html, javascript
    ❖ interactive widgets

    View full-size slide

  14. A language agnostic protocol

    View full-size slide

  15. A language agnostic protocol
    u a
    l
    j i

    View full-size slide

  16. A language agnostic protocol
    u a
    l
    j i

    View full-size slide

  17. A language agnostic protocol
    u a
    l
    j i
    ~100 different kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels

    View full-size slide

  18. IPython: an afternoon hack, 2001

    View full-size slide

  19. Plus ~ 1500 more Open source contributors!
    A true team effort

    View full-size slide

  20. Formalized governance
    Formal fiscal sponsorship
    Brian Granger
    Cal Poly, Amazon
    Me :)

    View full-size slide

  21. Educational impact:
    a view from Berkeley

    View full-size slide

  22. Fall 2018
    Data 100:
    ~800 students
    Data 8:
    ~1,300
    students

    View full-size slide

  23. With these tools, we provide:
    ❖ Broad disciplinary reach and impact of statistical
    thinking.
    ❖ Drastically lowered barriers to student access -
    intellectual and economic.
    ❖ Lowered barriers for faculty* to engage with
    statistical and computational ideas.
    ❖ (*) typically from non computational/statistical
    domains)
    Organizational and intellectual leadership:
    Cathryn Carson, Ani Adhikari, John DeNero, … (many more)

    View full-size slide

  24. Not just teaching toys:
    real research tools

    View full-size slide

  25. Reproducible Research
    An article about computational science in a scientific
    publication is not the scholarship itself, it is merely
    advertising of the scholarship. The actual scholarship is
    the complete software development environment and the
    complete set of instructions which generated the figures.
    Buckheit and Donoho, WaveLab and Reproducible Research, 1995

    View full-size slide

  26. 24 years later…
    https://twitter.com/KMS_Meltzy/status/1161008572634927104

    View full-size slide

  27. mybinder.org: shareable reproducibility
    github.com/freeman-lab
    Explicit Dependencies
    +
    +
    Origins:
    Jeremy Freeman’s
    lab at Janelia farm.
    That “incentives"
    business…

    View full-size slide

  28. LIGO: September 14, 2015

    View full-size slide

  29. The song of the universe
    http://bit.ly/black-holes-woop

    View full-size slide

  30. The song of the universe
    http://bit.ly/black-holes-woop

    View full-size slide

  31. April 18/19, 2019: Shep Doeleman & Katie Bouman

    View full-size slide

  32. LSST: one of the largest line items in NSF budget
    https://docushare.lsst.org/docushare/dsweb/Get/LSE-319

    View full-size slide

  33. High Energy Physics

    View full-size slide

  34. HEP & reproducibility
    A ML-based likelihood-free inference tool for
    particle physics:
    A new python-based implementation of a
    statistical tool used for Higgs discovery.
    Kyle Cranmer,
    NYU

    View full-size slide

  35. HEP & reproducibility
    A ML-based likelihood-free inference tool for
    particle physics:
    A new python-based implementation of a
    statistical tool used for Higgs discovery.
    Kyle Cranmer,
    NYU

    View full-size slide

  36. Geosciences: research & education
    Lindsey Heagy, Berkeley
    2019 GWH Career Achievement
    Award for outstanding junior scientist
    SimPEG: https://simpeg.xyz http://geosci.xyz

    View full-size slide

  37. interactivity

    View full-size slide

  38. Pangeo: open geosciences (and more!)
    Harnessing the power of cloud
    computing to study the whole
    earth interactively.
    https://pangeo.io
    Ryan Abernathey

    View full-size slide

  39. Pangeo: open geosciences (and more!)
    Harnessing the power of cloud
    computing to study the whole
    earth interactively.
    https://pangeo.io
    Ryan Abernathey

    View full-size slide

  40. Pangeo: open geosciences (and more!)
    Harnessing the power of cloud
    computing to study the whole
    earth interactively.
    https://pangeo.io
    Ryan Abernathey

    View full-size slide

  41. The “pangeo pattern”
    https://twitter.com/SurfTasmania/status/1126264352435097601
    ❖ Reuse, integrate, document
    ❖ Engage communities on
    their terms
    ❖ Contribute back upstream

    View full-size slide

  42. The Pangeo Principles
    Robinson, Hamman & Abernathy: https://arxiv.org/abs/1908.03356

    View full-size slide

  43. Jupyter - funding and resources

    View full-size slide

  44. So you want to build Data Science tools
    in academia…

    View full-size slide

  45. Scientific Open Source: Despite (direct) federal $$ support
    ❖ Note: “indirectly”, lots of $ have supported Scientific OSS projects/
    tools.
    ❖ Under the cover of domain-focused work.
    ❖ Recently recommended for funding “Jupyter meets the
    Earth” (Jupyter + Pangeo team) NSF grant (Earth Cube/Shree Mishra)
    ❖ FP, Laurel Larsen, Lindsey Heagy (Berkeley), Joe Hamman (NCAR)
    ❖ Thank you!!!

    View full-size slide

  46. Traditional software
    infrastructure
    funding
    Yes, it’s true, the budget is gone
    again… But you can’t deny that now,
    we get here in an instant!
    Quino (Argentinian cartoonist)

    View full-size slide

  47. Contrasts in culture and incentives
    Open Source Academia
    Credit Distributed PI & hierarchy
    Output/artifacts Continuous & Project-specific Discrete papers
    Collaborators Fluid: professionals, volunteers, … Structured, funding-dependent
    Governance/
    decision making
    Open, community based Top-down, PI
    Authorship
    Fluid, roles can evolve, no clear “first/
    senior” author
    Need to say more?
    Peer review Continuous, open, pervasive, friendly The opposite
    Value metric Utility, need, impact “Novel and transformative”

    View full-size slide

  48. “The Stack”: a complete ecosystem

    View full-size slide

  49. “The Stack”: a complete ecosystem

    View full-size slide

  50. “The Stack”: a complete ecosystem
    Domain-agnostic backbone/trunk

    View full-size slide

  51. “The Stack”: a complete ecosystem
    Domain-agnostic backbone/trunk
    • Not “real CS”
    • Not “real research”
    • Nobody’s problem
    • Yet critical to everybody else

    View full-size slide

  52. What do we do now?

    View full-size slide

  53. Software in science: I know the NSF cares!!
    https://diana-hep.org
    https://iris-hep.org
    http://urssi.us

    View full-size slide

  54. Organizations that fill current gaps

    View full-size slide

  55. Skills in education
    The Carpentries
    Tracy Teal
    Executive Director

    View full-size slide

  56. Skills in education
    The Carpentries
    Tracy Teal
    Executive Director
    The Society of Research Software Engineering was
    founded on the belief that a world which relies on
    software must recognise the people who develop it.
    https://society-rse.org
    The Society of Research Software Engineering
    Career paths

    View full-size slide

  57. Open communities & industry
    Leah Silen
    Executive Director
    Andy Terrel
    President

    View full-size slide

  58. leadership, management, organization building

    View full-size slide

  59. JOSS/JOSE: academic publication credit
    Arfon Smith
    STScI, Baltimore
    Lorena Barba
    GWU

    View full-size slide

  60. JOSS review checklist: a social hack

    View full-size slide

  61. JOSS experience is positive, and yet…

    View full-size slide

  62. JOSS experience is positive, and yet…
    Yet! Not indexed by Google Scholar https://github.com/openjournals/joss/issues/130

    View full-size slide

  63. HackWeeks: training and hacking meet domain research
    Training?

    From undergrads to senior PIs
    In the same room!!

    View full-size slide

  64. HackWeeks: training and hacking meet domain research
    Training?

    From undergrads to senior PIs
    In the same room!!

    View full-size slide

  65. HackWeeks: training and hacking meet domain research
    Training?

    From undergrads to senior PIs
    In the same room!!

    View full-size slide

  66. HackWeeks: training and hacking meet domain research
    Training?

    From undergrads to senior PIs
    In the same room!!

    View full-size slide

  67. HackWeeks: training and hacking meet domain research
    Training?

    From undergrads to senior PIs
    In the same room!!

    View full-size slide

  68. HackWeeks - a reproducible model
    https://doi.org/10.1073/pnas.1717196115
    Daniela Huppenkothen
    http://huppenkothen.org/

    View full-size slide

  69. Bang for the buck?
    ❖ Federal 2018 R&D budget: $176.8B (AAAS analysis)
    ❖ What fraction of R&D today depends critically on
    computing? 10%? 30%? 50%?
    ❖ $200M is ~0.1% of that.
    ❖ $200M annually (well spent) would have major
    impact.

    View full-size slide

  70. “Well spent” That should be easy…
    ❖ Some features of successful, resilient projects
    ❖ Broad community engagement
    ❖ Actively managed pipeline for new contributions
    ❖ Capacity for short and long-term planning
    ❖ Writing code only small part of the job
    ❖ Treat OSS projects like organizations

    View full-size slide

  71. Strategic vision: requires professionalization
    ❖ Full-time work
    ❖ R&D, operations, community, fundraising
    ❖ Professionalization is inclusive:
    ❖ reliance on volunteers excludes those who can’t
    afford to volunteer.

    View full-size slide

  72. Multi-stakeholder governance
    @nayafia

    View full-size slide

  73. You don’t get fruit without a tree…

    View full-size slide