reproducibility as the engine of science:
tools for reproducible research
Lindsey Heagy
UC Berkeley
@lindsey_jh
Slide 2
Slide 2 text
hello (a bit about me)
geophysical
inversions
open-source
software
open research &
education
geoscience + data
science
+
Slide 3
Slide 3 text
questions in the geosciences
Observations
/ Data
After Hamman, 2018
Theory &
Ideas
EMAG2: Earth Magnetic Anomaly Grid (2-arc-minute
resolution). Image credit: Dom Fournier (toolkit.geosci.xyz)
Simulations,
Computation
Slide 4
Slide 4 text
what are the ingredients?
● questions
● domain knowledge
● software
● data
● infrastructure
Slide 5
Slide 5 text
evolving research outputs & audiences
Variety of “consumers”:
● peers
● students
● decision makers & the public
Drives diversity in outputs
● journal publications
● web apps
● educational resources
● ...
Slide 6
Slide 6 text
on reproducibility
start here
on extensibility
“publish”
contribution
Slide 7
Slide 7 text
tools and platforms for researchers
Slide 8
Slide 8 text
scientific software
*Python ecosystem
Slide 9
Slide 9 text
interactive, exploratory computing
a community of people and an ecosystem of open
tools and standards for interactive computing
Slide 10
Slide 10 text
Jupyter notebooks
Slide 11
Slide 11 text
using notebooks
Slide 12
Slide 12 text
JupyterLab: a grand unified theory of Jupyter
Huge Team Effort!
C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout,
P. Ivanov, I. Rose, S. Silvester, C. Willing, J.
Zosa-Forde …
Slide 13
Slide 13 text
JupyterLab and notebooks ++
Slide 14
Slide 14 text
JupyterLab: more than notebooks
Slide 15
Slide 15 text
JupyterLab: data
Slide 16
Slide 16 text
JupyterLab is extensible: FlyBrainLab
An Interactive Computing Platform for the Fly Brain
BIONET Group, Columbia University
http://www.bionet.ee.columbia.edu
Aurel A. Lazar (PI)
Tingkai Liu
Mehmet K. Turkcan
Chung-Heng Yeh
Yiyin Zhou
http://fruitflybrain.org
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
notebooks on a or HPC
jupyter.org/hub
host pre-configured environments on shared
infrastructure
Slide 19
Slide 19 text
JupyterHub
Slide 20
Slide 20 text
myhub.org
fancy machine in
the cloud
Slide 21
Slide 21 text
myhub.org
Slide 22
Slide 22 text
myhub.org
environments
Slide 23
Slide 23 text
myhub.org
interfaces
environments
Slide 24
Slide 24 text
AUTHENTICATION
myhub.org
interfaces
environments
Slide 25
Slide 25 text
JupyterHub distributions
The Littlest JupyterHub
tljh.jupyter.org
JupyterHub on Kubernetes
z2jh.jupyter.org
A pre-configured JupyterHub setup with sensible defaults and
lots of documentation, fit for many use-cases
☁
Slide 26
Slide 26 text
Scalable in both users and in resources
Uses Docker for environment
management
Agnostic to the provider and
hardware configuration
Zero to
JupyterHub
for
Kubernetes
z2jh.jupyter.org
Slide 27
Slide 27 text
Data 8 at UC Berkeley
~2800 students (Spring/Fall 2019)
Slide 28
Slide 28 text
tools and platforms for researchers
Slide 29
Slide 29 text
National infrastructure from K-12 to HPC
J. Colliander,
I. Allison,
B. Carra
Slide 30
Slide 30 text
Harnessing the power of cloud
computing to study the whole
Earth interactively
Interactivity
Distributed computing
Data models / numerics
Slide 31
Slide 31 text
Pangeo architecture
Slide 32
Slide 32 text
Jupyter meets the Earth: an NSF grant (2M / 3Y)!
Fernando
Pérez
Joe
Hamman
Laurel
Larsen
Kevin Paul Lindsey
Heagy
Chris
Holdgraf
Yuvi
Panda
Research use-cases Tech developments
● Climate data analysis
● Hydrology
● Geophysics
● Data discovery
● Interactivity
● Cloud/HPC infrastructure
For more: http://bit.ly/jupytearth
Slide 33
Slide 33 text
Publication
(sharing research with other people)
Slide 34
Slide 34 text
the science more than the paper
An article about computational science in a scientific
publication is not the scholarship itself, it is merely advertising
of the scholarship. The actual scholarship is the complete
software development environment and the complete set of
instructions which generated the figures.
-- Buckheit and Donoho (paraphrasing Claerbout)
WaveLab and Reproducible Research, 1995
Slide 35
Slide 35 text
An article about computational science in a scientific
publication is not the scholarship itself, it is merely advertising
of the scholarship. The actual scholarship is the complete
software development environment and the complete set of
instructions which generated the figures.
(and a place to run the code?)
the science more than the paper
-- Buckheit and Donoho (paraphrasing Claerbout)
WaveLab and Reproducible Research, 1995
Slide 36
Slide 36 text
mybinder.org
shareable, interactive, reproducible
environments from your public git repository
Slide 37
Slide 37 text
binder
binder repo2docker JupyterHub
Slide 38
Slide 38 text
http://bit.ly/black-holes-woop
Black holes! LIGO, Sept 14, 2015
Slide 39
Slide 39 text
JupyterBook: computation + context
publish a collection of
notebooks as an online
textbook
inferentialthinking.com
Slide 40
Slide 40 text
New development: publishing executable books
QuantEcon IAB Jupyter Book
PDF
HTML
...
execution and
text content sync
citations, cross-refs,
rich metadata
Slide 41
Slide 41 text
reaching new audiences
Slide 42
Slide 42 text
on reproducibility
start here
on extensibility
“publish”
contribution
Groundwater in Myanmar
● Bring DC resistivity
equipment to Mon state
● Train local stakeholders
● Provide open-source
software and educational
resources
Slide 48
Slide 48 text
Reaching new audiences
Diverse research outputs:
● Papers
● Notebooks
● Apps
● Web-based textbooks
“Consumers” of science
● Scientists
● Students
● Public
Slide 49
Slide 49 text
Revisiting “publishing”
Slide 50
Slide 50 text
pdf model
capture research in a pdf, peer review, accepted(!)
scientist
consumers
Slide 51
Slide 51 text
pdf model
extending or building on ideas?
scientist
scientist?
Slide 52
Slide 52 text
pdf model
extending or building on ideas?
scientist
scientist
consumers
Slide 53
Slide 53 text
Blurring the line between scientists and audience?
● Open tools are
○ accessible
○ explorable
○ extensible
Slide 54
Slide 54 text
An open ecosystem supports the engine of science
● Open tools are a starting point
for…
○ reproducibility of work
○ collaboration at the level of
computation
○ extension of ideas
● And provide a trajectory for
“consumers” to become creators
Slide 55
Slide 55 text
Thank you!
@lheagy
[email protected]
@lindsey_jh
Special Thanks:
Rowan
Cockett
Chris
Holdgraf
Fernando
Pérez