Open Infrastructure
for open science
How Binder Powers an Open Stack in the Cloud
Chris Holdgraf, UC Berkeley and Project Jupyter
@choldgraf
Slide 2
Slide 2 text
you???
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
A bit about me then...
Slide 5
Slide 5 text
A bit about me now...
Slide 6
Slide 6 text
a community of people and an ecosystem of open
tools and standards for interactive computing
Slide 7
Slide 7 text
create things that are language-agnostic and modular.
Empower people to use other open tools.
Slide 8
Slide 8 text
Aside: Jupyter and the
last mile problem
Slide 9
Slide 9 text
You
San Jose
Coffee!
Slide 10
Slide 10 text
You
San Jose
Coffee!
Slide 11
Slide 11 text
You
San Jose
Coffee!
Slide 12
Slide 12 text
BART Bus
You
San Jose
Coffee!
Slide 13
Slide 13 text
BART Bus
The
“last mile”
You
San Jose
Coffee!
Slide 14
Slide 14 text
Public infrastructure gets us
closer to our goal.
It makes the last mile shorter.
Slide 15
Slide 15 text
How does Jupyter
fit in to this?
Slide 16
Slide 16 text
You
Your
awesome
report
Slide 17
Slide 17 text
You
Your
Awesome
report
Slide 18
Slide 18 text
You
Your
awesome
report
Jupyter shortens the last mile by creating
and leveraging public infrastructure
server
.ipynb
package
ecosystem
Notebook
document
specification
Jupyter
server
protocol
Interactive
Kernels
Notebook
interfaces
Slide 19
Slide 19 text
Back to our talk...
Slide 20
Slide 20 text
No content
Slide 21
Slide 21 text
Our mission for this talk
Us
.ipynb
Open,
reproducible,
sharable
environments
Slide 22
Slide 22 text
Part 1: from your laptop
to the cloud with
JupyterHub
Slide 23
Slide 23 text
(some) data science should
be taught to everyone
(no, really)
Slide 24
Slide 24 text
No content
Slide 25
Slide 25 text
How can we connect people
with computation?
Slide 26
Slide 26 text
What is JupyterHub?
Host pre-configured data science environments on
shared infrastructure
jupyter.org/hub
Slide 27
Slide 27 text
myhub.org
My fancy machine
in the cloud
Slide 28
Slide 28 text
myhub.org
Slide 29
Slide 29 text
myhub.org
environments
Slide 30
Slide 30 text
myhub.org
interfaces
environments
Slide 31
Slide 31 text
myhub.org
interfaces
environments
documents
Slide 32
Slide 32 text
compute
data
Slide 33
Slide 33 text
AUTHENTICATION
Slide 34
Slide 34 text
Chris Is Trying A Live Demo
Hopefully he doesn’t embarrass himself too badly.
Textbook link
Interact link
Slide 35
Slide 35 text
EXAMPLE + shout-out to
open humans
notebooks.openhumans.org
Slide 36
Slide 36 text
Us
.ipynb
Open,
reproducible,
sharable
environments
Slide 37
Slide 37 text
How can users
package+share their work?
Slide 38
Slide 38 text
Part 2: packaging and sharing
your environment with
repo2docker
Slide 39
Slide 39 text
What is repo2docker?
Convert a repository into a Docker image
that runs the code inside.
repo2docker.readthedocs.io
Slide 40
Slide 40 text
github.com/minrk/ligo-binder
Slide 41
Slide 41 text
No content
Slide 42
Slide 42 text
repo2docker
what does it
do?
$ jupyter repo2docker \
> https://github.com/minrk/ligo-binder
Slide 43
Slide 43 text
repo2docker
what does it
do?
$ jupyter repo2docker \
> https://github.com/minrk/ligo-binder
Cloning into
'/var/folders/.../T/repo2dockermu6z66sd'...
Using CondaBuildPack builder
Step 1/31 : FROM buildpack-deps:bionic
---> 29f4eef41002
Step 2/31 : ENV DEBIAN_FRONTEND=noninteractive
---> Using cache
---> ee1ba7c4f5f4
Step 3/31 : RUN apt-get update && apt-get
install --yes --no-install-recommends locales &&
apt-get purge && apt-get clean && rm -rf
/var/lib/apt/lists/*
Slide 44
Slide 44 text
repo2docker
what does it
do?
Step 1: get the repo
git clone https://github.com/me/myproject
Slide 45
Slide 45 text
repo2docker
what does it
do?
Step 2: Identify requirements
Slide 46
Slide 46 text
Step 2: Identify requirements
repo2docker
what does it
do?
Slide 47
Slide 47 text
Step 2: Identify requirements
repo2docker
what does it
do?
Slide 48
Slide 48 text
...
COPY conda/install-miniconda.bash /tmp/install-miniconda.bash
COPY conda/environment.py-3.6.frozen.yml /tmp/environment.yml
RUN bash /tmp/install-miniconda.bash && \
rm /tmp/install-miniconda.bash /tmp/environment.yml
...
Step 3: generate Dockerfile
repo2docker
what does it
do?
Slide 49
Slide 49 text
Step 3: generate Dockerfile
...
# Copy and chown stuff.
COPY src/ ${HOME}
RUN chown -R ${NB_USER}:${NB_USER} ${HOME}
# Run assemble scripts! These will actually build the spec
# in the repository into the image.
USER ${NB_USER}
RUN ${KERNEL_PYTHON_PREFIX}/bin/pip install --no-cache-dir \
-r "requirements.txt"
...
repo2docker
what does it
do?
Slide 50
Slide 50 text
Step 4: build (& push) image
docker build -t myimage
docker push myimage
repo2docker
what does it
do?
Slide 51
Slide 51 text
repo2docker
what does it
do?
$ jupyter repo2docker \
> https://github.com/minrk/ligo-binder
Cloning into
'/var/folders/.../T/repo2dockermu6z66sd'...
Using CondaBuildPack builder
Step 1/31 : FROM buildpack-deps:bionic
---> 29f4eef41002
Step 2/31 : ENV DEBIAN_FRONTEND=noninteractive
---> Using cache
---> ee1ba7c4f5f4
Step 3/31 : RUN apt-get update && apt-get
install --yes --no-install-recommends locales &&
apt-get purge && apt-get clean && rm -rf
/var/lib/apt/lists/*
● Repos should be human and machine readable
● Use existing specifications and standards
● Support many languages and interfaces
● Be lightweight and tightly-scoped, but
extendable
Guiding principles of repo2docker
Slide 54
Slide 54 text
Us
.ipynb
Open,
reproducible,
sharable
environments
Slide 55
Slide 55 text
Part 3: tying this together with
BinderHub
Slide 56
Slide 56 text
What is BinderHub?
One-click sharable, interactive, reproducible
environments from your public git repository
mybinder.org
binderhub.readthedocs.io
Slide 57
Slide 57 text
No content
Slide 58
Slide 58 text
●
●
●
●
BinderHub is open tech...
Slide 59
Slide 59 text
One example: mybinder.org
Slide 60
Slide 60 text
Chris Is Trying A Live Demo
Hopefully he doesn’t embarrass himself too badly.
mybinder.org
binder-examples/requirements
Slide 61
Slide 61 text
No content
Slide 62
Slide 62 text
Mybinder.org is an open service...
Slide 63
Slide 63 text
mybinder.org weekly sessions, last ~year
Slide 64
Slide 64 text
mybinder.org sessions, last month
Slide 65
Slide 65 text
No content
Slide 66
Slide 66 text
Some cool new projects
(or, stuff you can help out with)
Slide 67
Slide 67 text
Interactive books with
Jupyter-Book
github.com/jupyter/jupyter-book
HTML dashboards with voila
github.com/QuantStack/voila
Slide 70
Slide 70 text
In summary
jupyter.org
Jupyter makes the “last-mile” problem as small as
possible by building modular, open tools.
Us
.ipynb
Open,
reproducible,
sharable
environments
Slide 71
Slide 71 text
In summary
JupyterHub lets you create a shared, interactive
analytics environment
Us
.ipynb
Open,
reproducible,
sharable
environments
mybinder.org
Slide 72
Slide 72 text
In summary
repo2docker creates reproducible Docker images
from a repository
Us
.ipynb
Open,
reproducible,
sharable
environments
mybinder.org
Slide 73
Slide 73 text
In summary
BinderHub is an open web application to create
shareable, reproducible coding environments
Us
.ipynb
Open,
reproducible,
sharable
environments
Mybinder.org
Slide 74
Slide 74 text
Get involved with Jupyter
@choldgraf
jupyterhub-team-compass.readthedocs.io
discourse.jupyter.org
● All of these projects are open source, run by
open communities
● Jupyter is a place where *anybody* can
participate
● If you’d like to get involved: