Slide 1

Slide 1 text

Fernando Pérez Lindsey Heagy Open Source Software in Science: Beyond the Code

Slide 2

Slide 2 text

OSS: more than software Services and content Software Standards and Protocols Community

Slide 3

Slide 3 text

Content/Services

Slide 4

Slide 4 text

A language agnostic protocol

Slide 5

Slide 5 text

A language agnostic protocol u a l j i

Slide 6

Slide 6 text

A language agnostic protocol u a l j i

Slide 7

Slide 7 text

A language agnostic protocol u a l j i ~100 different kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels

Slide 8

Slide 8 text

Community: formalized governance Formal fiscal sponsorship Brian Granger Cal Poly, Amazon Me :)

Slide 9

Slide 9 text

More than software, woven into science Services and content: impact Software Standards and Protocols: ecosystem Community: innovation & resiliency People Ideas Tools Stories

Slide 10

Slide 10 text

OSS supports CORE Science*

Slide 11

Slide 11 text

OSS supports CORE Science* Collaborative Open Reproducible Extensible * With a nod to the FAIR principles of open data Lindsey Heagy

Slide 12

Slide 12 text

Collaborative?

Slide 13

Slide 13 text

Multiple stakeholders, team effort ❖ Academic scientists ❖ Educators ❖ Industry ❖ Government ❖ Media/journalism ❖ 1500+ community volunteers!

Slide 14

Slide 14 text

Jupyter meets the Earth: newly funded NSF grant - $2M/3y ● Climate data analysis ● Hydrology ● Geophysics ● Data discovery ● Interactivity ● Cloud/HPC infrastructure Fernando Perez Joe Hamman Laurel Larsen Kevin Paul Lindsey Heagy Chris Holdgraf Yuvi Panda Research use-cases Tech Developments

Slide 15

Slide 15 text

Open?

Slide 16

Slide 16 text

Dimensions of Openness ❖ Open source code ❖ Open (FAIR) data ❖ Open access publications & artifacts ❖ Open standards: interoperability (even with proprietary tools) ❖ Open community: all welcome (and mean it!) ❖ …

Slide 17

Slide 17 text

Reproducible? The foundation of collaboration!

Slide 18

Slide 18 text

mybinder.org: shareable reproducibility github.com/freeman-lab Explicit Dependencies + + Origins: Jeremy Freeman’s lab at Janelia farm. That “incentives" business… !key contributor! Tim Head @betatim

Slide 19

Slide 19 text

Black holes! LIGO, Sept 14, 2015 http://bit.ly/black-holes-woop

Slide 20

Slide 20 text

Black holes! LIGO, Sept 14, 2015 http://bit.ly/black-holes-woop

Slide 21

Slide 21 text

Extensible?

Slide 22

Slide 22 text

JupyterLab: a grand unified theory of Jupyter Huge Team Effort! C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P. Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …

Slide 23

Slide 23 text

JupyterLab is extensible: FlyBrainLab An Interactive Computing Platform for the Fly Brain BIONET Group, Columbia University http://www.bionet.ee.columbia.edu Aurel A. Lazar (PI) Tingkai Liu Mehmet K. Turkcan Chung-Heng Yeh Yiyin Zhou http://fruitflybrain.org

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Teaching with Programmable Notebooks Launched in September, NOTO (http:// noto.epfl.ch), EPFL's new JupyterLab platform for education, allows teachers and students to create and share programmable notebooks. https://actu.epfl.ch/news/teaching-with-online-programmable-notebooks

Slide 27

Slide 27 text

National infrastructure, from K-12 to HPC " J. Colliander, I. Allison, B. Carra

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

Impact: Research and Education

Slide 30

Slide 30 text

Data 8: Foundations of Data Science Cathryn Carson Ani Adhikari John De Nero + Data 100, Prob 140, Data 102, … + a large team!

Slide 31

Slide 31 text

April 18/19, 2019: Shep Doeleman & Katie Bouman

Slide 32

Slide 32 text

So you want to build Data Science tools in academia…

Slide 33

Slide 33 text

Jupyter - funding and resources

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

Contrasts in culture and incentives Open Source Academia Credit Distributed PI & hierarchy Output/artifacts Continuous & Project-specific Discrete papers Collaborators Fluid: professionals, volunteers, … Structured, funding-dependent Governance/ decision making Open, community based Top-down, PI Authorship Fluid, roles can evolve, no clear “first/ senior” author Need to say more? Peer review Continuous, open, pervasive, friendly The opposite Value metric Utility, need, impact “Novel and transformative”

Slide 36

Slide 36 text

Catastrophic Success: an economic problem (2015 data) https://arxiv.org/abs/1507.03989

Slide 37

Slide 37 text

Catastrophic Success: an economic problem (2015 data) https://arxiv.org/abs/1507.03989 ❖ MathWorks: 4,000+ employees ❖ Wolfram: 800 employees ❖ IDL/Harris: 17,000 employees

Slide 38

Slide 38 text

Thank you (Bay Area team) Current (Berkeley, LBNL, Bloomberg) Stacey Dorton, Lindsey Heagy, Chris Holdgraf, Yuvi Panda, Ryan Lovett, Shreyas Cholia, Shane Canon, Rollin Thomas, Jason Grout Former Berkeley Min Ragan-Kelley, Paul Ivanov, Thomas Kluyver, M Pacer, Matthias Bussonnier, Jessica Hamrick, Ian Rose, Jamie Whitacre.

Slide 39

Slide 39 text

❖ Economic incentives & sustainability ❖ Governance models ❖ Roles and professional career paths ❖ Multi-stakeholder organizations Scientific OSS at scale: complex challenges

Slide 40

Slide 40 text

❖ Economic incentives & sustainability ❖ Governance models ❖ Roles and professional career paths ❖ Multi-stakeholder organizations Scientific OSS at scale: complex challenges No scientist is trained for any of this!! Thank You!