Slide 1

Slide 1 text

Fernando Pérez Lindsey Heagy Scientific Open Source Software: meat and bits but not papers. Is it real work?

Slide 2

Slide 2 text

A really odd career Physics PhD: Lattice QCD Simulations Applied Math Postdoc: numerical algorithms

Slide 3

Slide 3 text

A really odd career Physics PhD: Lattice QCD Simulations Applied Math Postdoc: numerical algorithms

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

–Hamming'62 “The purpose of computing is insight, not numbers”

Slide 6

Slide 6 text

Maslov’s hierarchy of OSS Services and content Software Standards and Protocols Community

Slide 7

Slide 7 text

Services/Software

Slide 8

Slide 8 text

JupyterLab: a grand unified theory of Jupyter Huge Team Effort! C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P. Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …

Slide 9

Slide 9 text

Standards and Protocols

Slide 10

Slide 10 text

Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and servers HyperText Transport Protocol Image credit: eviltester.com

Slide 11

Slide 11 text

Core ideas of Jupyter Document Format https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods- for-Hackers Interactive Computing Protocol SUB SUB DEAL Client SUB DEAL DEAL DEAL ROUT PUB ROUT ROUT Kernel ØMQ + JSON

Slide 12

Slide 12 text

A language agnostic protocol

Slide 13

Slide 13 text

A language agnostic protocol u a l j i

Slide 14

Slide 14 text

A language agnostic protocol u a l j i

Slide 15

Slide 15 text

A language agnostic protocol u a l j i ~100 different kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels

Slide 16

Slide 16 text

Community

Slide 17

Slide 17 text

IPython: an afternoon hack, 2001

Slide 18

Slide 18 text

Plus ~ 1500 more Open source contributors! A true team effort

Slide 19

Slide 19 text

Formalized governance Formal fiscal sponsorship Brian Granger Cal Poly, Amazon Me :)

Slide 20

Slide 20 text

Educational impact: a view from Berkeley

Slide 21

Slide 21 text

Fall 2018 Data 100: ~800 students Data 8: ~1,300 students

Slide 22

Slide 22 text

With these tools, we provide: ❖ Broad disciplinary reach and impact of statistical thinking. ❖ Drastically lowered barriers to student access - intellectual and economic. ❖ Lowered barriers for faculty* to engage with statistical and computational ideas. ❖ (*) typically from non computational/statistical domains) Organizational and intellectual leadership: Cathryn Carson, Ani Adhikari, John DeNero, … (many more)

Slide 23

Slide 23 text

Not just teaching toys: real research tools

Slide 24

Slide 24 text

Reproducible Research An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. Buckheit and Donoho, WaveLab and Reproducible Research, 1995

Slide 25

Slide 25 text

mybinder.org: shareable reproducibility github.com/freeman-lab Explicit Dependencies + + Origins: Jeremy Freeman’s lab at Janelia farm. That “incentives" business…

Slide 26

Slide 26 text

LIGO: September 14, 2015

Slide 27

Slide 27 text

The song of the universe http://bit.ly/black-holes-woop

Slide 28

Slide 28 text

The song of the universe http://bit.ly/black-holes-woop

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

April 18/19, 2019: Shep Doeleman & Katie Bouman

Slide 31

Slide 31 text

Geosciences: research & education Lindsey Heagy, Berkeley 2019 GWH Career Achievement Award for outstanding junior scientist SimPEG: https://simpeg.xyz http://geosci.xyz

Slide 32

Slide 32 text

Pangeo: open geosciences (and more!) Harnessing the power of cloud computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey Joe Hamman

Slide 33

Slide 33 text

Pangeo: open geosciences (and more!) Harnessing the power of cloud computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey Joe Hamman

Slide 34

Slide 34 text

Pangeo: open geosciences (and more!) Harnessing the power of cloud computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey Joe Hamman

Slide 35

Slide 35 text

Jupyter meets the Earth: newly funded NSF grant - $2M/3y ● CMIP6 Climate data analysis ● Large scale hydrological modelling ● Geophysical simulations and inversions ● Data discovery through JupyterLab ● Interactivity: Widgets & Dashboards ● JupyterHub: Using and managing shared computational infrastructure Fernando Perez Joe Hamman Laurel Larsen Kevin Paul Lindsey Heagy Chris Holdgraf Yuvi Panda Research use-cases Tech developments

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

Jupyter - funding and resources

Slide 38

Slide 38 text

So you want to build Data Science tools in academia…

Slide 39

Slide 39 text

Career paths?

Slide 40

Slide 40 text

John Hunter Pediatric Neurology

Slide 41

Slide 41 text

John Hunter

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

Scientific Open Source: Despite (direct) federal $$ support ❖ “Indirectly”, lots of $ have supported Scientific OSS projects/tools. ❖ Under the cover of domain-focused work.

Slide 44

Slide 44 text

Traditional software infrastructure funding Yes, it’s true, the budget is gone again… But you can’t deny that now, we get here in an instant! Quino (Argentinian cartoonist)

Slide 45

Slide 45 text

Contrasts in culture and incentives Open Source Academia Credit Distributed PI & hierarchy Output/artifacts Continuous & Project-specific Discrete papers Collaborators Fluid: professionals, volunteers, … Structured, funding-dependent Governance/ decision making Open, community based Top-down, PI Authorship Fluid, roles can evolve, no clear “first/ senior” author Need to say more? Peer review Continuous, open, pervasive, friendly The opposite Value metric Utility, need, impact “Novel and transformative”

Slide 46

Slide 46 text

“The Stack”: a complete ecosystem

Slide 47

Slide 47 text

“The Stack”: a complete ecosystem

Slide 48

Slide 48 text

“The Stack”: a complete ecosystem Domain-agnostic backbone/trunk

Slide 49

Slide 49 text

“The Stack”: a complete ecosystem Domain-agnostic backbone/trunk • Not “real CS” • Not “real research” • Nobody’s problem • Yet critical to everybody else

Slide 50

Slide 50 text

Organizations that fill current gaps

Slide 51

Slide 51 text

Skills in education The Carpentries Tracy Teal Executive Director

Slide 52

Slide 52 text

Skills in education The Carpentries Tracy Teal Executive Director The Society of Research Software Engineering was founded on the belief that a world which relies on software must recognise the people who develop it. https://society-rse.org The Society of Research Software Engineering Career paths

Slide 53

Slide 53 text

Open communities & industry Leah Silen Executive Director Andy Terrel President

Slide 54

Slide 54 text

leadership, management, organization building

Slide 55

Slide 55 text

JOSS/JOSE: academic publication credit Arfon Smith STScI, Baltimore Lorena Barba GWU

Slide 56

Slide 56 text

HackWeeks: training and hacking meet domain research Training?
 From undergrads to senior PIs In the same room!!

Slide 57

Slide 57 text

HackWeeks: training and hacking meet domain research Training?
 From undergrads to senior PIs In the same room!!

Slide 58

Slide 58 text

HackWeeks: training and hacking meet domain research Training?
 From undergrads to senior PIs In the same room!!

Slide 59

Slide 59 text

HackWeeks: training and hacking meet domain research Training?
 From undergrads to senior PIs In the same room!!

Slide 60

Slide 60 text

HackWeeks: training and hacking meet domain research Training?
 From undergrads to senior PIs In the same room!!

Slide 61

Slide 61 text

HackWeeks - a reproducible model https://doi.org/10.1073/pnas.1717196115 Daniela Huppenkothen http://huppenkothen.org/

Slide 62

Slide 62 text

An economic and organizational problem

Slide 63

Slide 63 text

Catastrophic Success: an economic problem (2015 data) https://arxiv.org/abs/1507.03989

Slide 64

Slide 64 text

Catastrophic Success: an economic problem (2015 data) https://arxiv.org/abs/1507.03989 ❖ MathWorks: 4,000+ employees ❖ Wolfram: 800 employees ❖ IDL/Harris: 17,000 employees

Slide 65

Slide 65 text

Investing to hedge strategic risks ❖ It takes investment to have a seat at the table. ❖ Scientists (and their funders) want a voice? ❖ The code is already out - whose voices will shape it?

Slide 66

Slide 66 text

Bang for the buck? ❖ Federal 2018 R&D budget: $176.8B (AAAS analysis) ❖ What fraction of R&D today depends critically on computing? 10%? 30%? 50%? ❖ $200M is ~0.1% of that. ❖ $200M annually (well spent) would have major impact.

Slide 67

Slide 67 text

“Well spent” That should be easy… ❖ Some features of successful, resilient projects ❖ Broad community engagement ❖ Actively managed pipeline for new contributions ❖ Capacity for short and long-term planning ❖ Writing code only small part of the job ❖ Treat OSS projects like real, complex organizations

Slide 68

Slide 68 text

It’s in the air… "many projects of immense infrastructural importance are simultaneously fundamental to multiple business models and also chronically underfunded”

Slide 69

Slide 69 text

Ford Foundation report, authored by Nadia Eghbal @nayafia

Slide 70

Slide 70 text

Multi-stakeholder governance @nayafia

Slide 71

Slide 71 text

❖ Economic incentives and sustainability ❖ Governance models ❖ Roles and professional career paths ❖ Multi-stakeholder organizational structures OSS is a lot more than software

Slide 72

Slide 72 text

Thank you (Bay Area team) Current (Berkeley, LBNL, Bloomberg) Stacey Dorton, Lindsey Heagy, Chris Holdgraf, Yuvi Panda, Ryan Lovett, Shreyas Cholia, Shane Canon, Rollin Thomas, Jason Grout Former Berkeley Min Ragan-Kelley, Paul Ivanov, Thomas Kluyver, M Pacer, Matthias Bussonnier, Jessica Hamrick, Ian Rose, Jamie Whitacre.

Slide 73

Slide 73 text

In Memoriam - John Hunter, 1968-2012