Slide 1

Slide 1 text

Fernando Pérez Lindsey Heagy Open source, academic science and the public mission of research: reflections from the field

Slide 2

Slide 2 text

–Hamming'62 “The purpose of computing is insight, not numbers”

Slide 3

Slide 3 text

What is ❖ Software ❖ Standards and Protocols ❖ Community

Slide 4

Slide 4 text

Software

Slide 5

Slide 5 text

JupyterLab: a grand unified theory of Jupyter Huge Team Effort! C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P. Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …

Slide 6

Slide 6 text

Standards and Protocols

Slide 7

Slide 7 text

Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and servers HyperText Transport Protocol Image credit: eviltester.com

Slide 8

Slide 8 text

Core ideas of Jupyter Document Format https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods- for-Hackers Interactive Computing Protocol SUB SUB DEAL Client SUB DEAL DEAL DEAL ROUT PUB ROUT ROUT Kernel ØMQ + JSON

Slide 9

Slide 9 text

Jupyter Protocol web-age capture of the process of interactive computing any mime-type output

Slide 10

Slide 10 text

Jupyter Protocol web-age capture of the process of interactive computing any mime-type output ❖ text

Slide 11

Slide 11 text

Jupyter Protocol web-age capture of the process of interactive computing any mime-type output ❖ text ❖ svg, png, jpeg

Slide 12

Slide 12 text

Jupyter Protocol web-age capture of the process of interactive computing any mime-type output ❖ text ❖ svg, png, jpeg ❖ latex, pdf

Slide 13

Slide 13 text

Jupyter Protocol web-age capture of the process of interactive computing any mime-type output ❖ text ❖ svg, png, jpeg ❖ latex, pdf ❖ html, javascript

Slide 14

Slide 14 text

Jupyter Protocol web-age capture of the process of interactive computing any mime-type output ❖ text ❖ svg, png, jpeg ❖ latex, pdf ❖ html, javascript ❖ interactive widgets

Slide 15

Slide 15 text

A language agnostic protocol

Slide 16

Slide 16 text

A language agnostic protocol u a l j i

Slide 17

Slide 17 text

A language agnostic protocol u a l j i

Slide 18

Slide 18 text

A language agnostic protocol u a l j i ~100 different kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels

Slide 19

Slide 19 text

Community

Slide 20

Slide 20 text

IPython: an afternoon hack, 2001

Slide 21

Slide 21 text

Plus ~ 1500 more Open source contributors! A true team effort

Slide 22

Slide 22 text

Formalized governance Formal fiscal sponsorship Brian Granger Cal Poly, Amazon Me :)

Slide 23

Slide 23 text

Educational impact: a view from Berkeley

Slide 24

Slide 24 text

Fall 2018 Data 100: ~800 students Data 8: ~1,300 students

Slide 25

Slide 25 text

With these tools, we provide: ❖ Broad disciplinary reach and impact of statistical thinking. ❖ Drastically lowered barriers to student access - intellectual and economic. ❖ Lowered barriers for faculty* to engage with statistical and computational ideas. ❖ (*) typically from non computational/statistical domains) Organizational and intellectual leadership: Cathryn Carson, Ani Adhikari, John DeNero, … (many more)

Slide 26

Slide 26 text

Not just teaching toys: real research tools

Slide 27

Slide 27 text

Reproducible Research An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. Buckheit and Donoho, WaveLab and Reproducible Research, 1995

Slide 28

Slide 28 text

24 years later… https://twitter.com/KMS_Meltzy/status/1161008572634927104

Slide 29

Slide 29 text

mybinder.org: shareable reproducibility github.com/freeman-lab Explicit Dependencies + + Origins: Jeremy Freeman’s lab at Janelia farm. That “incentives" business…

Slide 30

Slide 30 text

LIGO: September 14, 2015

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

The song of the universe http://bit.ly/black-holes-woop

Slide 33

Slide 33 text

The song of the universe http://bit.ly/black-holes-woop

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

April 18/19, 2019: Shep Doeleman & Katie Bouman

Slide 36

Slide 36 text

LSST: one of the largest line items in NSF budget https://docushare.lsst.org/docushare/dsweb/Get/LSE-319

Slide 37

Slide 37 text

High Energy Physics

Slide 38

Slide 38 text

HEP & reproducibility A ML-based likelihood-free inference tool for particle physics: A new python-based implementation of a statistical tool used for Higgs discovery. Kyle Cranmer, NYU

Slide 39

Slide 39 text

HEP & reproducibility A ML-based likelihood-free inference tool for particle physics: A new python-based implementation of a statistical tool used for Higgs discovery. Kyle Cranmer, NYU

Slide 40

Slide 40 text

Geosciences: research & education Lindsey Heagy, Berkeley 2019 GWH Career Achievement Award for outstanding junior scientist SimPEG: https://simpeg.xyz http://geosci.xyz

Slide 41

Slide 41 text

interactivity

Slide 42

Slide 42 text

Pangeo: open geosciences (and more!) Harnessing the power of cloud computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey

Slide 43

Slide 43 text

Pangeo: open geosciences (and more!) Harnessing the power of cloud computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey

Slide 44

Slide 44 text

Pangeo: open geosciences (and more!) Harnessing the power of cloud computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey

Slide 45

Slide 45 text

The “pangeo pattern” https://twitter.com/SurfTasmania/status/1126264352435097601 ❖ Reuse, integrate, document ❖ Engage communities on their terms ❖ Contribute back upstream

Slide 46

Slide 46 text

The Pangeo Principles Robinson, Hamman & Abernathy: https://arxiv.org/abs/1908.03356

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

Jupyter - funding and resources

Slide 49

Slide 49 text

So you want to build Data Science tools in academia…

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

No content

Slide 52

Slide 52 text

Scientific Open Source: Despite (direct) federal $$ support ❖ Note: “indirectly”, lots of $ have supported Scientific OSS projects/ tools. ❖ Under the cover of domain-focused work. ❖ Recently recommended for funding “Jupyter meets the Earth” (Jupyter + Pangeo team) NSF grant (Earth Cube/Shree Mishra) ❖ FP, Laurel Larsen, Lindsey Heagy (Berkeley), Joe Hamman (NCAR) ❖ Thank you!!!

Slide 53

Slide 53 text

Traditional software infrastructure funding Yes, it’s true, the budget is gone again… But you can’t deny that now, we get here in an instant! Quino (Argentinian cartoonist)

Slide 54

Slide 54 text

Contrasts in culture and incentives Open Source Academia Credit Distributed PI & hierarchy Output/artifacts Continuous & Project-specific Discrete papers Collaborators Fluid: professionals, volunteers, … Structured, funding-dependent Governance/ decision making Open, community based Top-down, PI Authorship Fluid, roles can evolve, no clear “first/ senior” author Need to say more? Peer review Continuous, open, pervasive, friendly The opposite Value metric Utility, need, impact “Novel and transformative”

Slide 55

Slide 55 text

“The Stack”: a complete ecosystem

Slide 56

Slide 56 text

“The Stack”: a complete ecosystem

Slide 57

Slide 57 text

“The Stack”: a complete ecosystem Domain-agnostic backbone/trunk

Slide 58

Slide 58 text

“The Stack”: a complete ecosystem Domain-agnostic backbone/trunk • Not “real CS” • Not “real research” • Nobody’s problem • Yet critical to everybody else

Slide 59

Slide 59 text

What do we do now?

Slide 60

Slide 60 text

Software in science: I know the NSF cares!! https://diana-hep.org https://iris-hep.org http://urssi.us

Slide 61

Slide 61 text

Organizations that fill current gaps

Slide 62

Slide 62 text

Skills in education The Carpentries Tracy Teal Executive Director

Slide 63

Slide 63 text

Skills in education The Carpentries Tracy Teal Executive Director The Society of Research Software Engineering was founded on the belief that a world which relies on software must recognise the people who develop it. https://society-rse.org The Society of Research Software Engineering Career paths

Slide 64

Slide 64 text

Open communities & industry Leah Silen Executive Director Andy Terrel President

Slide 65

Slide 65 text

leadership, management, organization building

Slide 66

Slide 66 text

JOSS/JOSE: academic publication credit Arfon Smith STScI, Baltimore Lorena Barba GWU

Slide 67

Slide 67 text

JOSS review checklist: a social hack

Slide 68

Slide 68 text

JOSS experience is positive, and yet…

Slide 69

Slide 69 text

JOSS experience is positive, and yet… Yet! Not indexed by Google Scholar https://github.com/openjournals/joss/issues/130

Slide 70

Slide 70 text

HackWeeks: training and hacking meet domain research Training?
 From undergrads to senior PIs In the same room!!

Slide 71

Slide 71 text

HackWeeks: training and hacking meet domain research Training?
 From undergrads to senior PIs In the same room!!

Slide 72

Slide 72 text

HackWeeks: training and hacking meet domain research Training?
 From undergrads to senior PIs In the same room!!

Slide 73

Slide 73 text

HackWeeks: training and hacking meet domain research Training?
 From undergrads to senior PIs In the same room!!

Slide 74

Slide 74 text

HackWeeks: training and hacking meet domain research Training?
 From undergrads to senior PIs In the same room!!

Slide 75

Slide 75 text

HackWeeks - a reproducible model https://doi.org/10.1073/pnas.1717196115 Daniela Huppenkothen http://huppenkothen.org/

Slide 76

Slide 76 text

Bang for the buck? ❖ Federal 2018 R&D budget: $176.8B (AAAS analysis) ❖ What fraction of R&D today depends critically on computing? 10%? 30%? 50%? ❖ $200M is ~0.1% of that. ❖ $200M annually (well spent) would have major impact.

Slide 77

Slide 77 text

“Well spent” That should be easy… ❖ Some features of successful, resilient projects ❖ Broad community engagement ❖ Actively managed pipeline for new contributions ❖ Capacity for short and long-term planning ❖ Writing code only small part of the job ❖ Treat OSS projects like organizations

Slide 78

Slide 78 text

Strategic vision: requires professionalization ❖ Full-time work ❖ R&D, operations, community, fundraising ❖ Professionalization is inclusive: ❖ reliance on volunteers excludes those who can’t afford to volunteer.

Slide 79

Slide 79 text

Multi-stakeholder governance @nayafia

Slide 80

Slide 80 text

You don’t get fruit without a tree…