Fernando Pérez
[email protected]
Building an open platform for
research and education in data
science
Project Jupyter
Slide 2
Slide 2 text
A few bits about me
Medellín, Colombia
University of Colorado, Boulder
Physics
Applied Math
Computation
Slide 3
Slide 3 text
Statistics & me: then and now
If your result needs a statistician then
you should design a better experiment
(prob. mis-attributed)
E. Rutherford
PhD: Lattice QCD
Simulations
Slide 4
Slide 4 text
Why?
Slide 5
Slide 5 text
Why?
❖ Ethical: openness as fairness
❖ Human/social: openness fosters collaboration.
❖ Epistemological: proprietary science is an oxymoron.
❖ Technical: Python was cool :)
Slide 6
Slide 6 text
Python - The Beginning
the most important
lesson I learned
was about sharing
– Guido van Rossum
http://neopythonic.blogspot.com/2016/04/kings-day-speech.html
Slide credit: C. Willing
Slide 7
Slide 7 text
Designed for Learning
In reality, programming languages
are how programmers express and
communicate ideas — and the
audience for those ideas is other
programmers, not computers.
http://neopythonic.blogspot.com/2016/04/kings-day-speech.html
– Guido van Rossum
Slide credit: C. Willing
Slide 8
Slide 8 text
What?
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
IPython: Interactive Python, 2001
A humble start:
IPython 0.0.1, 259 LOC
“Just an afternoon hack”
https://gist.github.com/fperez/1579699
Slide 11
Slide 11 text
Team today: where all the credit goes
Plus ~ 1500 more Open source contributors!
Slide 12
Slide 12 text
The IPython/Jupyter Notebook
❖ Rich web client
❖ Text & math
❖ Code
❖ Results
❖ Share, reproduce.
Slide 13
Slide 13 text
Core ideas of the web: HTTP & HTML
HTML: format to represent content
HyperText Markup Language
HTTP: protocol to connect clients and servers
HyperText Transport Protocol
Image credit: eviltester.com
Slide 14
Slide 14 text
Core ideas of Jupyter
Document Format
https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
Interactive Computing Protocol
SUB SUB DEAL
Client
SUB
DEAL
DEAL
DEAL
ROUT
PUB ROUT
ROUT
Kernel
ØMQ + JSON
Slide 15
Slide 15 text
Jupyter Protocol
web-age capture of the process of interactive computing
any mime-type output
❖ text
❖ svg, png, jpeg
❖ latex, pdf
❖ html, javascript
❖ interactive widgets
Slide 16
Slide 16 text
Jupyter Protocol
is language agnostic
u a
l
j i
~100 different kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
Classic ‘Notebook’…
Slide 19
Slide 19 text
JupyterLab: a grand unified theory of Jupyter
Huge Team Effort!
C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P.
Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …
Slide 20
Slide 20 text
Live Demo!
Slide 21
Slide 21 text
Reproducible Research
An article about computational science in a scientific
publication is not the scholarship itself, it is merely
advertising of the scholarship. The actual scholarship is
the complete software development environment and the
complete set of instructions which generated the figures.
Buckheit and Donoho, WaveLab and Reproducible Research, 1995
Slide 22
Slide 22 text
No content
Slide 23
Slide 23 text
No content
Slide 24
Slide 24 text
No content
Slide 25
Slide 25 text
No content
Slide 26
Slide 26 text
JupyterHub: multiuser support
Slide 27
Slide 27 text
CODING
ENVIRONMENT
AUTHENTICATION
Slides credit: C. Holdgraf
Slide 28
Slide 28 text
What does this mean for science + education?
❖ Can utilize…
❖ ...shared hardware/compute for running code
❖ ...shared data storage for big datasets
❖ ...shared environments for doing work
❖ ...shared workflows, ideas, and results
Slide 29
Slide 29 text
CODING
ENVIRONMENT
AUTHENTICATION
FANCY HARDWARE
Slide 30
Slide 30 text
CODING
ENVIRONMENT
AUTHENTICATION
CONTENT ON
THE WEB
CONTENT ON
THE WEB
ON-DEMAND
ENVIRONMENTS
BinderHub
Slide 33
Slide 33 text
No content
Slide 34
Slide 34 text
A long time ago in a galaxy far, far away…
Rµ⌫
1
2
R gµ⌫ + ⇤gµ⌫ =
8⇡G
c4
Tµ⌫
AAACXHicbVFLSysxGM2Meq9WvbcquHETLIJwtcyIoC4EwYUuXKi0KjS9JZPJ1GAmMyTfCEPIn3SnG/+K6WNRHx8EDufBl5wkpRQGoug1COfmF379XlxqLK+s/vnbXFu/M0WlGe+yQhb6IaGGS6F4FwRI/lBqTvNE8vvk6Xyk3z9zbUShOlCXvJ/ToRKZYBQ8NWiCJakwpaS1gVpyfDuwJK8wURV2+5ZApinDNnb2wLlbsjeckf+RK78npXiWPLVkEjkmpcAXzrL/9tA515nxuEGzFbWj8eDvIJ6CFprO9aD5QtKCVTlXwCQ1phdHJfQt1SCY5K5BKsNLyp7okPc8VDTnpm/H7Ti845kUZ4X2RwEes7MJS3Nj6jzxzpzCo/mqjciftF4F2XHfClVWwBWbLMoqiaHAo6pxKjRnIGsPKNPC3xWzR+rbAf8hDV9C/PXJ30H3oH3Sjm8OW2edaRuLaAtto10UoyN0hi7RNeoiht4CFCwFjeA9XAiXw9WJNQymmQ30acLNDwCLtUM=
AAACXHicbVFLSysxGM2Meq9WvbcquHETLIJwtcyIoC4EwYUuXKi0KjS9JZPJ1GAmMyTfCEPIn3SnG/+K6WNRHx8EDufBl5wkpRQGoug1COfmF379XlxqLK+s/vnbXFu/M0WlGe+yQhb6IaGGS6F4FwRI/lBqTvNE8vvk6Xyk3z9zbUShOlCXvJ/ToRKZYBQ8NWiCJakwpaS1gVpyfDuwJK8wURV2+5ZApinDNnb2wLlbsjeckf+RK78npXiWPLVkEjkmpcAXzrL/9tA515nxuEGzFbWj8eDvIJ6CFprO9aD5QtKCVTlXwCQ1phdHJfQt1SCY5K5BKsNLyp7okPc8VDTnpm/H7Ti845kUZ4X2RwEes7MJS3Nj6jzxzpzCo/mqjciftF4F2XHfClVWwBWbLMoqiaHAo6pxKjRnIGsPKNPC3xWzR+rbAf8hDV9C/PXJ30H3oH3Sjm8OW2edaRuLaAtto10UoyN0hi7RNeoiht4CFCwFjeA9XAiXw9WJNQymmQ30acLNDwCLtUM=
AAACXHicbVFLSysxGM2Meq9WvbcquHETLIJwtcyIoC4EwYUuXKi0KjS9JZPJ1GAmMyTfCEPIn3SnG/+K6WNRHx8EDufBl5wkpRQGoug1COfmF379XlxqLK+s/vnbXFu/M0WlGe+yQhb6IaGGS6F4FwRI/lBqTvNE8vvk6Xyk3z9zbUShOlCXvJ/ToRKZYBQ8NWiCJakwpaS1gVpyfDuwJK8wURV2+5ZApinDNnb2wLlbsjeckf+RK78npXiWPLVkEjkmpcAXzrL/9tA515nxuEGzFbWj8eDvIJ6CFprO9aD5QtKCVTlXwCQ1phdHJfQt1SCY5K5BKsNLyp7okPc8VDTnpm/H7Ti845kUZ4X2RwEes7MJS3Nj6jzxzpzCo/mqjciftF4F2XHfClVWwBWbLMoqiaHAo6pxKjRnIGsPKNPC3xWzR+rbAf8hDV9C/PXJ30H3oH3Sjm8OW2edaRuLaAtto10UoyN0hi7RNeoiht4CFCwFjeA9XAiXw9WJNQymmQ30acLNDwCLtUM=
AAACXHicbVFLSysxGM2Meq9WvbcquHETLIJwtcyIoC4EwYUuXKi0KjS9JZPJ1GAmMyTfCEPIn3SnG/+K6WNRHx8EDufBl5wkpRQGoug1COfmF379XlxqLK+s/vnbXFu/M0WlGe+yQhb6IaGGS6F4FwRI/lBqTvNE8vvk6Xyk3z9zbUShOlCXvJ/ToRKZYBQ8NWiCJakwpaS1gVpyfDuwJK8wURV2+5ZApinDNnb2wLlbsjeckf+RK78npXiWPLVkEjkmpcAXzrL/9tA515nxuEGzFbWj8eDvIJ6CFprO9aD5QtKCVTlXwCQ1phdHJfQt1SCY5K5BKsNLyp7okPc8VDTnpm/H7Ti845kUZ4X2RwEes7MJS3Nj6jzxzpzCo/mqjciftF4F2XHfClVWwBWbLMoqiaHAo6pxKjRnIGsPKNPC3xWzR+rbAf8hDV9C/PXJ30H3oH3Sjm8OW2edaRuLaAtto10UoyN0hi7RNeoiht4CFCwFjeA9XAiXw9WJNQymmQ30acLNDwCLtUM=
Einstein’s Field Equations of General Relativity
Annalen der Physik, 1916
Slide 35
Slide 35 text
Two identical detectors: Hanford, WA and Livingston, LA
LIGO: a feat of science &
engineering
Detection problem:
• ~ 1/1000 proton over 4 km.
• Sensitivity ~ 1e-21
• Milky Way: 1e+21m across!
Slide 36
Slide 36 text
September 14, 2015
Slide 37
Slide 37 text
The song of the universe
Using the IPython.display.Audio object
Slide 38
Slide 38 text
LIGO: Open Science with Jupyter
Slide 39
Slide 39 text
Binder: reproducible, executable scholarship
from averaging ~150 people per week to averaging ~2,900 people per week
Berkeley: Yuvi Panda, Chris Holdgraf
Cal Poly: Carol Willing
Simula: Min Ragan-Kelley
Jessica Zosa-Forde, Tim Head
Slide 40
Slide 40 text
A tool FOR research,
a subject OF research
Slide 41
Slide 41 text
Anatomy of
a notebook
http://adamrule.com/files/papers/chi_2018_computational_notebooks_final_web.pdf
https://blog.jupyter.org/we-analyzed-1-million-jupyter-notebooks-now-you-can-too-
guest-post-8116a964b536
Structure and design
• Adam Rule et al. (UCSD)
• analyzed 1 million
notebooks
• design opportunities
• Dataset is PUBLIC!
Slide credit: C. Willing
DataHub
datahub.berkeley.edu
Supporting 2,500+ users
Being used for Data 8, as well
as several other courses
Requires @berkeley.edu to
access
Running on Azure with almost
zero maintenance
Slide: C. Holdgraf
Slide 45
Slide 45 text
Data 8 & Data100: massive uptake
D100 Sp18: ~650
students
D8 Sp18: ~1,100 students
Slide 46
Slide 46 text
Fastest growing courses
in Berkeley history
Thanks to
Yuvi Panda (DSEP), Ryan Lovett (Statistics),
DSEP team
Slide 47
Slide 47 text
Berkeley in a few years…
“We are witnessing a monumental phase shift in data science knowledge on campus -
undergrads are extremely well trained…”
Ciera Martinez, BIDS Fellow
Slide 48
Slide 48 text
Today! (April 17, 2018)
Slide 49
Slide 49 text
From K-12 to HPC
!
Slide 50
Slide 50 text
Wide industrial adoption
Slide 51
Slide 51 text
2018!
Save 20% with PJ20
jupytercon.com @JupyterCon
Slide 52
Slide 52 text
You may have seen this last week :)
https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676
Slide 53
Slide 53 text
The world of science and education
wants open platforms
https://github.com/parente/nbestimate
~1.7M notebooks
on GitHub in Jan 2018