Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scientific Open Source Software: meat and bits but not papers. Is it real work?

95198572b00e5fbcd97fb5315215bf7a?s=47 Fernando Perez
October 08, 2019

Scientific Open Source Software: meat and bits but not papers. Is it real work?

A discussion on the role of open source software in science, its sustainability and current outlook.

Co-authored with Lindsey Heagy (https://lindseyjh.ca).

Video of the presentation available at: https://cdac.uchicago.edu/insights/fernando-perez-scientific-open-source-software

95198572b00e5fbcd97fb5315215bf7a?s=128

Fernando Perez

October 08, 2019
Tweet

Transcript

  1. Fernando Pérez Lindsey Heagy Scientific Open Source Software: meat and

    bits but not papers. Is it real work?
  2. A really odd career Physics PhD: Lattice QCD Simulations Applied

    Math Postdoc: numerical algorithms
  3. A really odd career Physics PhD: Lattice QCD Simulations Applied

    Math Postdoc: numerical algorithms
  4. None
  5. –Hamming'62 “The purpose of computing is insight, not numbers”

  6. Maslov’s hierarchy of OSS Services and content Software Standards and

    Protocols Community
  7. Services/Software

  8. JupyterLab: a grand unified theory of Jupyter Huge Team Effort!

    C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P. Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …
  9. Standards and Protocols

  10. Core ideas of the web: HTTP & HTML HTML: format

    to represent content HyperText Markup Language HTTP: protocol to connect clients and servers HyperText Transport Protocol Image credit: eviltester.com
  11. Core ideas of Jupyter Document Format https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods- for-Hackers Interactive Computing

    Protocol SUB SUB DEAL Client SUB DEAL DEAL DEAL ROUT PUB ROUT ROUT Kernel ØMQ + JSON
  12. A language agnostic protocol

  13. A language agnostic protocol u a l j i

  14. A language agnostic protocol u a l j i

  15. A language agnostic protocol u a l j i ~100

    different kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
  16. Community

  17. IPython: an afternoon hack, 2001

  18. Plus ~ 1500 more Open source contributors! A true team

    effort
  19. Formalized governance Formal fiscal sponsorship Brian Granger Cal Poly, Amazon

    Me :)
  20. Educational impact: a view from Berkeley

  21. Fall 2018 Data 100: ~800 students Data 8: ~1,300 students

  22. With these tools, we provide: ❖ Broad disciplinary reach and

    impact of statistical thinking. ❖ Drastically lowered barriers to student access - intellectual and economic. ❖ Lowered barriers for faculty* to engage with statistical and computational ideas. ❖ (*) typically from non computational/statistical domains) Organizational and intellectual leadership: Cathryn Carson, Ani Adhikari, John DeNero, … (many more)
  23. Not just teaching toys: real research tools

  24. Reproducible Research An article about computational science in a scientific

    publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. Buckheit and Donoho, WaveLab and Reproducible Research, 1995
  25. mybinder.org: shareable reproducibility github.com/freeman-lab Explicit Dependencies + + Origins: Jeremy

    Freeman’s lab at Janelia farm. That “incentives" business…
  26. LIGO: September 14, 2015

  27. The song of the universe http://bit.ly/black-holes-woop

  28. The song of the universe http://bit.ly/black-holes-woop

  29. None
  30. April 18/19, 2019: Shep Doeleman & Katie Bouman

  31. Geosciences: research & education Lindsey Heagy, Berkeley 2019 GWH Career

    Achievement Award for outstanding junior scientist SimPEG: https://simpeg.xyz http://geosci.xyz
  32. Pangeo: open geosciences (and more!) Harnessing the power of cloud

    computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey Joe Hamman
  33. Pangeo: open geosciences (and more!) Harnessing the power of cloud

    computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey Joe Hamman
  34. Pangeo: open geosciences (and more!) Harnessing the power of cloud

    computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey Joe Hamman
  35. Jupyter meets the Earth: newly funded NSF grant - $2M/3y

    • CMIP6 Climate data analysis • Large scale hydrological modelling • Geophysical simulations and inversions • Data discovery through JupyterLab • Interactivity: Widgets & Dashboards • JupyterHub: Using and managing shared computational infrastructure Fernando Perez Joe Hamman Laurel Larsen Kevin Paul Lindsey Heagy Chris Holdgraf Yuvi Panda Research use-cases Tech developments
  36. None
  37. Jupyter - funding and resources

  38. So you want to build Data Science tools in academia…

  39. Career paths?

  40. John Hunter Pediatric Neurology

  41. John Hunter

  42. None
  43. Scientific Open Source: Despite (direct) federal $$ support ❖ “Indirectly”,

    lots of $ have supported Scientific OSS projects/tools. ❖ Under the cover of domain-focused work.
  44. Traditional software infrastructure funding Yes, it’s true, the budget is

    gone again… But you can’t deny that now, we get here in an instant! Quino (Argentinian cartoonist)
  45. Contrasts in culture and incentives Open Source Academia Credit Distributed

    PI & hierarchy Output/artifacts Continuous & Project-specific Discrete papers Collaborators Fluid: professionals, volunteers, … Structured, funding-dependent Governance/ decision making Open, community based Top-down, PI Authorship Fluid, roles can evolve, no clear “first/ senior” author Need to say more? Peer review Continuous, open, pervasive, friendly The opposite Value metric Utility, need, impact “Novel and transformative”
  46. “The Stack”: a complete ecosystem

  47. “The Stack”: a complete ecosystem

  48. “The Stack”: a complete ecosystem Domain-agnostic backbone/trunk

  49. “The Stack”: a complete ecosystem Domain-agnostic backbone/trunk • Not “real

    CS” • Not “real research” • Nobody’s problem • Yet critical to everybody else
  50. Organizations that fill current gaps

  51. Skills in education The Carpentries Tracy Teal Executive Director

  52. Skills in education The Carpentries Tracy Teal Executive Director The

    Society of Research Software Engineering was founded on the belief that a world which relies on software must recognise the people who develop it. https://society-rse.org The Society of Research Software Engineering Career paths
  53. Open communities & industry Leah Silen Executive Director Andy Terrel

    President
  54. leadership, management, organization building

  55. JOSS/JOSE: academic publication credit Arfon Smith STScI, Baltimore Lorena Barba

    GWU
  56. HackWeeks: training and hacking meet domain research Training?
 From undergrads

    to senior PIs In the same room!!
  57. HackWeeks: training and hacking meet domain research Training?
 From undergrads

    to senior PIs In the same room!!
  58. HackWeeks: training and hacking meet domain research Training?
 From undergrads

    to senior PIs In the same room!!
  59. HackWeeks: training and hacking meet domain research Training?
 From undergrads

    to senior PIs In the same room!!
  60. HackWeeks: training and hacking meet domain research Training?
 From undergrads

    to senior PIs In the same room!!
  61. HackWeeks - a reproducible model https://doi.org/10.1073/pnas.1717196115 Daniela Huppenkothen http://huppenkothen.org/

  62. An economic and organizational problem

  63. Catastrophic Success: an economic problem (2015 data) https://arxiv.org/abs/1507.03989

  64. Catastrophic Success: an economic problem (2015 data) https://arxiv.org/abs/1507.03989 ❖ MathWorks:

    4,000+ employees ❖ Wolfram: 800 employees ❖ IDL/Harris: 17,000 employees
  65. Investing to hedge strategic risks ❖ It takes investment to

    have a seat at the table. ❖ Scientists (and their funders) want a voice? ❖ The code is already out - whose voices will shape it?
  66. Bang for the buck? ❖ Federal 2018 R&D budget: $176.8B

    (AAAS analysis) ❖ What fraction of R&D today depends critically on computing? 10%? 30%? 50%? ❖ $200M is ~0.1% of that. ❖ $200M annually (well spent) would have major impact.
  67. “Well spent” That should be easy… ❖ Some features of

    successful, resilient projects ❖ Broad community engagement ❖ Actively managed pipeline for new contributions ❖ Capacity for short and long-term planning ❖ Writing code only small part of the job ❖ Treat OSS projects like real, complex organizations
  68. It’s in the air… "many projects of immense infrastructural importance

    are simultaneously fundamental to multiple business models and also chronically underfunded”
  69. Ford Foundation report, authored by Nadia Eghbal @nayafia

  70. Multi-stakeholder governance @nayafia

  71. ❖ Economic incentives and sustainability ❖ Governance models ❖ Roles

    and professional career paths ❖ Multi-stakeholder organizational structures OSS is a lot more than software
  72. Thank you (Bay Area team) Current (Berkeley, LBNL, Bloomberg) Stacey

    Dorton, Lindsey Heagy, Chris Holdgraf, Yuvi Panda, Ryan Lovett, Shreyas Cholia, Shane Canon, Rollin Thomas, Jason Grout Former Berkeley Min Ragan-Kelley, Paul Ivanov, Thomas Kluyver, M Pacer, Matthias Bussonnier, Jessica Hamrick, Ian Rose, Jamie Whitacre.
  73. In Memoriam - John Hunter, 1968-2012