Slide 1

Slide 1 text

Fernando Pérez [email protected] Building an open platform for research and education in data science Project Jupyter

Slide 2

Slide 2 text

A few bits about me Medellín, Colombia University of Colorado, Boulder Physics Applied Math Computation

Slide 3

Slide 3 text

Statistics & me: then and now If your result needs a statistician then you should design a better experiment (prob. mis-attributed) E. Rutherford PhD: Lattice QCD Simulations

Slide 4

Slide 4 text

Why?

Slide 5

Slide 5 text

Why? ❖ Ethical: openness as fairness ❖ Human/social: openness fosters collaboration. ❖ Epistemological: proprietary science is an oxymoron. ❖ Technical: Python was cool :)

Slide 6

Slide 6 text

Python - The Beginning the most important lesson I learned was about sharing – Guido van Rossum http://neopythonic.blogspot.com/2016/04/kings-day-speech.html Slide credit: C. Willing

Slide 7

Slide 7 text

Designed for Learning In reality, programming languages are how programmers express and communicate ideas — and the audience for those ideas is other programmers, not computers. http://neopythonic.blogspot.com/2016/04/kings-day-speech.html – Guido van Rossum Slide credit: C. Willing

Slide 8

Slide 8 text

What?

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

IPython: Interactive Python, 2001 A humble start: IPython 0.0.1, 259 LOC “Just an afternoon hack” https://gist.github.com/fperez/1579699

Slide 11

Slide 11 text

Team today: where all the credit goes Plus ~ 1500 more Open source contributors!

Slide 12

Slide 12 text

The IPython/Jupyter Notebook ❖ Rich web client ❖ Text & math ❖ Code ❖ Results ❖ Share, reproduce.

Slide 13

Slide 13 text

Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and servers HyperText Transport Protocol Image credit: eviltester.com

Slide 14

Slide 14 text

Core ideas of Jupyter Document Format https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers Interactive Computing Protocol SUB SUB DEAL Client SUB DEAL DEAL DEAL ROUT PUB ROUT ROUT Kernel ØMQ + JSON

Slide 15

Slide 15 text

Jupyter Protocol web-age capture of the process of interactive computing any mime-type output ❖ text ❖ svg, png, jpeg ❖ latex, pdf ❖ html, javascript ❖ interactive widgets

Slide 16

Slide 16 text

Jupyter Protocol is language agnostic u a l j i ~100 different kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Classic ‘Notebook’…

Slide 19

Slide 19 text

JupyterLab: a grand unified theory of Jupyter Huge Team Effort! C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P. Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …

Slide 20

Slide 20 text

Live Demo!

Slide 21

Slide 21 text

Reproducible Research An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. Buckheit and Donoho, WaveLab and Reproducible Research, 1995

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

JupyterHub: multiuser support

Slide 27

Slide 27 text

CODING ENVIRONMENT AUTHENTICATION Slides credit: C. Holdgraf

Slide 28

Slide 28 text

What does this mean for science + education? ❖ Can utilize… ❖ ...shared hardware/compute for running code ❖ ...shared data storage for big datasets ❖ ...shared environments for doing work ❖ ...shared workflows, ideas, and results

Slide 29

Slide 29 text

CODING ENVIRONMENT AUTHENTICATION FANCY HARDWARE

Slide 30

Slide 30 text

CODING ENVIRONMENT AUTHENTICATION CONTENT ON THE WEB

Slide 31

Slide 31 text

mybinder.org: shareable reproducibility github.com/freeman-lab Explicit Dependencies + + Origins

Slide 32

Slide 32 text

CONTENT ON THE WEB ON-DEMAND ENVIRONMENTS BinderHub

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

A long time ago in a galaxy far, far away… Rµ⌫ 1 2 R gµ⌫ + ⇤gµ⌫ = 8⇡G c4 Tµ⌫ AAACXHicbVFLSysxGM2Meq9WvbcquHETLIJwtcyIoC4EwYUuXKi0KjS9JZPJ1GAmMyTfCEPIn3SnG/+K6WNRHx8EDufBl5wkpRQGoug1COfmF379XlxqLK+s/vnbXFu/M0WlGe+yQhb6IaGGS6F4FwRI/lBqTvNE8vvk6Xyk3z9zbUShOlCXvJ/ToRKZYBQ8NWiCJakwpaS1gVpyfDuwJK8wURV2+5ZApinDNnb2wLlbsjeckf+RK78npXiWPLVkEjkmpcAXzrL/9tA515nxuEGzFbWj8eDvIJ6CFprO9aD5QtKCVTlXwCQ1phdHJfQt1SCY5K5BKsNLyp7okPc8VDTnpm/H7Ti845kUZ4X2RwEes7MJS3Nj6jzxzpzCo/mqjciftF4F2XHfClVWwBWbLMoqiaHAo6pxKjRnIGsPKNPC3xWzR+rbAf8hDV9C/PXJ30H3oH3Sjm8OW2edaRuLaAtto10UoyN0hi7RNeoiht4CFCwFjeA9XAiXw9WJNQymmQ30acLNDwCLtUM= AAACXHicbVFLSysxGM2Meq9WvbcquHETLIJwtcyIoC4EwYUuXKi0KjS9JZPJ1GAmMyTfCEPIn3SnG/+K6WNRHx8EDufBl5wkpRQGoug1COfmF379XlxqLK+s/vnbXFu/M0WlGe+yQhb6IaGGS6F4FwRI/lBqTvNE8vvk6Xyk3z9zbUShOlCXvJ/ToRKZYBQ8NWiCJakwpaS1gVpyfDuwJK8wURV2+5ZApinDNnb2wLlbsjeckf+RK78npXiWPLVkEjkmpcAXzrL/9tA515nxuEGzFbWj8eDvIJ6CFprO9aD5QtKCVTlXwCQ1phdHJfQt1SCY5K5BKsNLyp7okPc8VDTnpm/H7Ti845kUZ4X2RwEes7MJS3Nj6jzxzpzCo/mqjciftF4F2XHfClVWwBWbLMoqiaHAo6pxKjRnIGsPKNPC3xWzR+rbAf8hDV9C/PXJ30H3oH3Sjm8OW2edaRuLaAtto10UoyN0hi7RNeoiht4CFCwFjeA9XAiXw9WJNQymmQ30acLNDwCLtUM= AAACXHicbVFLSysxGM2Meq9WvbcquHETLIJwtcyIoC4EwYUuXKi0KjS9JZPJ1GAmMyTfCEPIn3SnG/+K6WNRHx8EDufBl5wkpRQGoug1COfmF379XlxqLK+s/vnbXFu/M0WlGe+yQhb6IaGGS6F4FwRI/lBqTvNE8vvk6Xyk3z9zbUShOlCXvJ/ToRKZYBQ8NWiCJakwpaS1gVpyfDuwJK8wURV2+5ZApinDNnb2wLlbsjeckf+RK78npXiWPLVkEjkmpcAXzrL/9tA515nxuEGzFbWj8eDvIJ6CFprO9aD5QtKCVTlXwCQ1phdHJfQt1SCY5K5BKsNLyp7okPc8VDTnpm/H7Ti845kUZ4X2RwEes7MJS3Nj6jzxzpzCo/mqjciftF4F2XHfClVWwBWbLMoqiaHAo6pxKjRnIGsPKNPC3xWzR+rbAf8hDV9C/PXJ30H3oH3Sjm8OW2edaRuLaAtto10UoyN0hi7RNeoiht4CFCwFjeA9XAiXw9WJNQymmQ30acLNDwCLtUM= AAACXHicbVFLSysxGM2Meq9WvbcquHETLIJwtcyIoC4EwYUuXKi0KjS9JZPJ1GAmMyTfCEPIn3SnG/+K6WNRHx8EDufBl5wkpRQGoug1COfmF379XlxqLK+s/vnbXFu/M0WlGe+yQhb6IaGGS6F4FwRI/lBqTvNE8vvk6Xyk3z9zbUShOlCXvJ/ToRKZYBQ8NWiCJakwpaS1gVpyfDuwJK8wURV2+5ZApinDNnb2wLlbsjeckf+RK78npXiWPLVkEjkmpcAXzrL/9tA515nxuEGzFbWj8eDvIJ6CFprO9aD5QtKCVTlXwCQ1phdHJfQt1SCY5K5BKsNLyp7okPc8VDTnpm/H7Ti845kUZ4X2RwEes7MJS3Nj6jzxzpzCo/mqjciftF4F2XHfClVWwBWbLMoqiaHAo6pxKjRnIGsPKNPC3xWzR+rbAf8hDV9C/PXJ30H3oH3Sjm8OW2edaRuLaAtto10UoyN0hi7RNeoiht4CFCwFjeA9XAiXw9WJNQymmQ30acLNDwCLtUM= Einstein’s Field Equations of General Relativity Annalen der Physik, 1916

Slide 35

Slide 35 text

Two identical detectors: Hanford, WA and Livingston, LA LIGO: a feat of science & engineering Detection problem: • ~ 1/1000 proton over 4 km. • Sensitivity ~ 1e-21 • Milky Way: 1e+21m across!

Slide 36

Slide 36 text

September 14, 2015

Slide 37

Slide 37 text

The song of the universe Using the IPython.display.Audio object

Slide 38

Slide 38 text

LIGO: Open Science with Jupyter

Slide 39

Slide 39 text

Binder: reproducible, executable scholarship from averaging ~150 people per week to averaging ~2,900 people per week Berkeley: Yuvi Panda, Chris Holdgraf Cal Poly: Carol Willing Simula: Min Ragan-Kelley Jessica Zosa-Forde, Tim Head

Slide 40

Slide 40 text

A tool FOR research, a subject OF research

Slide 41

Slide 41 text

Anatomy of a notebook http://adamrule.com/files/papers/chi_2018_computational_notebooks_final_web.pdf https://blog.jupyter.org/we-analyzed-1-million-jupyter-notebooks-now-you-can-too- guest-post-8116a964b536 Structure and design • Adam Rule et al. (UCSD) • analyzed 1 million notebooks • design opportunities • Dataset is PUBLIC! Slide credit: C. Willing

Slide 42

Slide 42 text

Education

Slide 43

Slide 43 text

Berkeley’s Data Science Courses http://data8.org ❖ Freshmen & upper division ❖ Interactive textbooks: Jupyter Notebooks ❖ Course deployment: JupyterHub http://ds100.org

Slide 44

Slide 44 text

DataHub datahub.berkeley.edu Supporting 2,500+ users Being used for Data 8, as well as several other courses Requires @berkeley.edu to access Running on Azure with almost zero maintenance Slide: C. Holdgraf

Slide 45

Slide 45 text

Data 8 & Data100: massive uptake D100 Sp18: ~650 students D8 Sp18: ~1,100 students

Slide 46

Slide 46 text

Fastest growing courses in Berkeley history Thanks to
 Yuvi Panda (DSEP), Ryan Lovett (Statistics), DSEP team

Slide 47

Slide 47 text

Berkeley in a few years… “We are witnessing a monumental phase shift in data science knowledge on campus - undergrads are extremely well trained…” Ciera Martinez, BIDS Fellow

Slide 48

Slide 48 text

Today! (April 17, 2018)

Slide 49

Slide 49 text

From K-12 to HPC !

Slide 50

Slide 50 text

Wide industrial adoption

Slide 51

Slide 51 text

2018! Save 20% with PJ20 jupytercon.com @JupyterCon

Slide 52

Slide 52 text

You may have seen this last week :) https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676

Slide 53

Slide 53 text

The world of science and education wants open platforms https://github.com/parente/nbestimate ~1.7M notebooks on GitHub in Jan 2018

Slide 54

Slide 54 text

Back to openness: ethics and inclusion

Slide 55

Slide 55 text

Jupyter @ Berkeley and LBNL

Slide 56

Slide 56 text

Funding and resources

Slide 57

Slide 57 text

Thank You!