Slide 1

Slide 1 text

reproducibility as the engine of science: tools for reproducible research Lindsey Heagy UC Berkeley @lindsey_jh

Slide 2

Slide 2 text

hello (a bit about me) geophysical inversions open-source software open research & education geoscience + data science +

Slide 3

Slide 3 text

questions in the geosciences Observations / Data After Hamman, 2018 Theory & Ideas EMAG2: Earth Magnetic Anomaly Grid (2-arc-minute resolution). Image credit: Dom Fournier (toolkit.geosci.xyz) Simulations, Computation

Slide 4

Slide 4 text

what are the ingredients? ● questions ● domain knowledge ● software ● data ● infrastructure

Slide 5

Slide 5 text

evolving research outputs & audiences Variety of “consumers”: ● peers ● students ● decision makers & the public Drives diversity in outputs ● journal publications ● web apps ● educational resources ● ...

Slide 6

Slide 6 text

on reproducibility start here on extensibility “publish” contribution

Slide 7

Slide 7 text

tools and platforms for researchers

Slide 8

Slide 8 text

scientific software *Python ecosystem

Slide 9

Slide 9 text

interactive, exploratory computing a community of people and an ecosystem of open tools and standards for interactive computing

Slide 10

Slide 10 text

Jupyter notebooks

Slide 11

Slide 11 text

using notebooks

Slide 12

Slide 12 text

JupyterLab: a grand unified theory of Jupyter Huge Team Effort! C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P. Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …

Slide 13

Slide 13 text

JupyterLab and notebooks ++

Slide 14

Slide 14 text

JupyterLab: more than notebooks

Slide 15

Slide 15 text

JupyterLab: data

Slide 16

Slide 16 text

JupyterLab is extensible: FlyBrainLab An Interactive Computing Platform for the Fly Brain BIONET Group, Columbia University http://www.bionet.ee.columbia.edu Aurel A. Lazar (PI) Tingkai Liu Mehmet K. Turkcan Chung-Heng Yeh Yiyin Zhou http://fruitflybrain.org

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

notebooks on a or HPC jupyter.org/hub host pre-configured environments on shared infrastructure

Slide 19

Slide 19 text

JupyterHub

Slide 20

Slide 20 text

myhub.org fancy machine in the cloud

Slide 21

Slide 21 text

myhub.org

Slide 22

Slide 22 text

myhub.org environments

Slide 23

Slide 23 text

myhub.org interfaces environments

Slide 24

Slide 24 text

AUTHENTICATION myhub.org interfaces environments

Slide 25

Slide 25 text

JupyterHub distributions The Littlest JupyterHub tljh.jupyter.org JupyterHub on Kubernetes z2jh.jupyter.org A pre-configured JupyterHub setup with sensible defaults and lots of documentation, fit for many use-cases ☁

Slide 26

Slide 26 text

Scalable in both users and in resources Uses Docker for environment management Agnostic to the provider and hardware configuration Zero to JupyterHub for Kubernetes z2jh.jupyter.org

Slide 27

Slide 27 text

Data 8 at UC Berkeley ~2800 students (Spring/Fall 2019)

Slide 28

Slide 28 text

tools and platforms for researchers

Slide 29

Slide 29 text

National infrastructure from K-12 to HPC J. Colliander, I. Allison, B. Carra

Slide 30

Slide 30 text

Harnessing the power of cloud computing to study the whole Earth interactively Interactivity Distributed computing Data models / numerics

Slide 31

Slide 31 text

Pangeo architecture

Slide 32

Slide 32 text

Jupyter meets the Earth: an NSF grant (2M / 3Y)! Fernando Pérez Joe Hamman Laurel Larsen Kevin Paul Lindsey Heagy Chris Holdgraf Yuvi Panda Research use-cases Tech developments ● Climate data analysis ● Hydrology ● Geophysics ● Data discovery ● Interactivity ● Cloud/HPC infrastructure For more: http://bit.ly/jupytearth

Slide 33

Slide 33 text

Publication (sharing research with other people)

Slide 34

Slide 34 text

the science more than the paper An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. -- Buckheit and Donoho (paraphrasing Claerbout) WaveLab and Reproducible Research, 1995

Slide 35

Slide 35 text

An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. (and a place to run the code?) the science more than the paper -- Buckheit and Donoho (paraphrasing Claerbout) WaveLab and Reproducible Research, 1995

Slide 36

Slide 36 text

mybinder.org shareable, interactive, reproducible environments from your public git repository

Slide 37

Slide 37 text

binder binder repo2docker JupyterHub

Slide 38

Slide 38 text

http://bit.ly/black-holes-woop Black holes! LIGO, Sept 14, 2015

Slide 39

Slide 39 text

JupyterBook: computation + context publish a collection of notebooks as an online textbook inferentialthinking.com

Slide 40

Slide 40 text

New development: publishing executable books QuantEcon IAB Jupyter Book PDF HTML ... execution and text content sync citations, cross-refs, rich metadata

Slide 41

Slide 41 text

reaching new audiences

Slide 42

Slide 42 text

on reproducibility start here on extensibility “publish” contribution

Slide 43

Slide 43 text

Reaching new audiences: educational resources

Slide 44

Slide 44 text

Reaching new audiences: educational resources

Slide 45

Slide 45 text

a medium for conversation

Slide 46

Slide 46 text

GeoSci.xyz https://geosci.xyz EOSC 350 26 locations worldwide

Slide 47

Slide 47 text

Groundwater in Myanmar ● Bring DC resistivity equipment to Mon state ● Train local stakeholders ● Provide open-source software and educational resources

Slide 48

Slide 48 text

Reaching new audiences Diverse research outputs: ● Papers ● Notebooks ● Apps ● Web-based textbooks “Consumers” of science ● Scientists ● Students ● Public

Slide 49

Slide 49 text

Revisiting “publishing”

Slide 50

Slide 50 text

pdf model capture research in a pdf, peer review, accepted(!) scientist consumers

Slide 51

Slide 51 text

pdf model extending or building on ideas? scientist scientist?

Slide 52

Slide 52 text

pdf model extending or building on ideas? scientist scientist consumers

Slide 53

Slide 53 text

Blurring the line between scientists and audience? ● Open tools are ○ accessible ○ explorable ○ extensible

Slide 54

Slide 54 text

An open ecosystem supports the engine of science ● Open tools are a starting point for… ○ reproducibility of work ○ collaboration at the level of computation ○ extension of ideas ● And provide a trajectory for “consumers” to become creators

Slide 55

Slide 55 text

Thank you! @lheagy [email protected] @lindsey_jh Special Thanks: Rowan Cockett Chris Holdgraf Fernando Pérez

Slide 56

Slide 56 text

No content