Slide 1

Slide 1 text

@WillingCarol Scaling Reproducible Research with Jupyter 4th Workshop on Open Science in Big Data (OSBD) IEEE Big Data, Los Angeles December 9, 2019 1 Carol Willing @WillingCarol 10.5281/zenodo.3567219.

Slide 2

Slide 2 text

@WillingCarol 2 Using data responsibly to solve real world issues and improve human lives Reproducible Research

Slide 3

Slide 3 text

@WillingCarol 3 San Diego, CA

Slide 4

Slide 4 text

@WillingCarol 4 Tokyo

Slide 5

Slide 5 text

@WillingCarol 5 Sunday Oct 6 Source: ECMWF

Slide 6

Slide 6 text

Copyright: 2019 European Union, contains modified Copernicus Sentinel data 2019, processed by EUMETSAT Super Typhoon Hagibis View of Super Typhoon Hagibis south-west of Japan, as captured by the Copernicus Sentinel-3 satellite on 08 October at 00:16 UTC.

Slide 7

Slide 7 text

Title Typhoon Hagibis Released 10/10/2019 4:45 pm Copyright contains modified Copernicus Sentinel data (2019), processed by ESA, CC BY-SA 3.0 IGO

Slide 8

Slide 8 text

Source:Twitter

Slide 9

Slide 9 text

@WillingCarol 9

Slide 10

Slide 10 text

@WillingCarol 10

Slide 11

Slide 11 text

A sign is partially submerged as the Tama River floods during Typhoon Hagibis. Source:Getty Images Source:Japan Times

Slide 12

Slide 12 text

@WillingCarol Preparation Evacuation Safety 12

Slide 13

Slide 13 text

@WillingCarol Lives depend on 13

Slide 14

Slide 14 text

@WillingCarol scaling reproducible research 14

Slide 15

Slide 15 text

@WillingCarol Tools Processes Communication 15

Slide 16

Slide 16 text

@WillingCarol 16 jupyter.org

Slide 17

Slide 17 text

@WillingCarol Research 17 Jupyter Citations Number 0 1000 2000 3000 4000 2015 2016 2017 2018 2019 Projected

Slide 18

Slide 18 text


 Millions of Notebooks https://github.com/trending/jupyter-notebook Over 5 million on GitHub

Slide 19

Slide 19 text

@WillingCarol 19 ‣ Growth ‣ ACM Award ‣ Industry adoption ‣ Creative uses ‣ Open Source Book

Slide 20

Slide 20 text

@WillingCarol 20 JupyterLab

Slide 21

Slide 21 text

@WillingCarol 21 jupyter.org demo of JupyterLab

Slide 22

Slide 22 text

@WillingCarol Healthy Best Practices 22

Slide 23

Slide 23 text

@WillingCarol 23 Ten Simple Rules for Reproducible Research in Jupyter Notebooks Adam Rule et al. https://github.com/jupyter-guide/ten-rules-jupyter https://github.com/jupyter-guide/jupyter-guide

Slide 24

Slide 24 text

@WillingCarol Keep up with changes 24 https://tinyletter.com/TrackingJupyter

Slide 25

Slide 25 text

@WillingCarol Proceed cautiously with pseudo-open projects 25

Slide 26

Slide 26 text

@WillingCarol Ask why 26

Slide 27

Slide 27 text

@WillingCarol Tools Processes Communication 27

Slide 28

Slide 28 text

A pictorial representation of the different tools constituting BinderHub. This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence. Zenodo record. https://blog.jupyter.org/diving-into- leadership-to-build-push-button-code- df2a075c9914 zero-to-jupyterhub.readthedocs.io

Slide 29

Slide 29 text

@WillingCarol 29 nteract Papermill Scrapbook Bookstore Commuter Production data at scale 29 https://medium.com/netflix-techblog/notebook-innovation-591ee3221233 Bookstore

Slide 30

Slide 30 text

@WillingCarol 30 Papermill - parameterize / run Scrapbook - recording / reading Bookstore - store notebooks Commuter - share notebooks Production data at scale 30

Slide 31

Slide 31 text

@WillingCarol 31 Papermill Parameterize and Run

Slide 32

Slide 32 text

@WillingCarol Create a Reproducibility Pipeline 32

Slide 33

Slide 33 text

@WillingCarol Decouple steps for flexibility 33

Slide 34

Slide 34 text

@WillingCarol Plan Execute Change 34 https://jupyterhub-team-compass.readthedocs.io https://github.com/jupyterhub/team-compass

Slide 35

Slide 35 text

@WillingCarol Tools Processes Communication 35

Slide 36

Slide 36 text

@WillingCarol 36

Slide 37

Slide 37 text

@WillingCarol 37 Deploy your own BinderHub mybinder.org Binder 2.0 blog post elifesciences: Share your interactive research environment Nature article about Binder

Slide 38

Slide 38 text

38 Juliette Taka

Slide 39

Slide 39 text

39 Juliette Taka

Slide 40

Slide 40 text

40 Juliette Taka

Slide 41

Slide 41 text

41 Juliette Taka

Slide 42

Slide 42 text

42 Juliette Taka

Slide 43

Slide 43 text

43 Juliette Taka

Slide 44

Slide 44 text

@WillingCarol Binder 44

Slide 45

Slide 45 text

@WillingCarol Pangeo 45 https://pangeo.io

Slide 46

Slide 46 text

@WillingCarol 46

Slide 47

Slide 47 text

@WillingCarol 47 https://simexp.github.io/vcog_hps_ad_book/intro.html Jupyter Book Binder Jupyter pandas scipy scikit learn matplotlib numpy seaborn Canadian Open Neuroscience Platform

Slide 48

Slide 48 text

@WillingCarol Build Communities 48

Slide 49

Slide 49 text

jupyter.org

Slide 50

Slide 50 text

@WillingCarol Leverage solutions across disciplines 50

Slide 51

Slide 51 text

@WillingCarol Share binders. Foster scientific research. 51

Slide 52

Slide 52 text

@WillingCarol Tools Processes Communication 52

Slide 53

Slide 53 text

@WillingCarol Why strive for reproducible research? 53

Slide 54

Slide 54 text

@WillingCarol Reproducible research improves prediction 54

Slide 55

Slide 55 text

@WillingCarol prediction = impact 55

Slide 56

Slide 56 text

@WillingCarol 56 Scaling reproducible research improves science and our world

Slide 57

Slide 57 text

@WillingCarol 57 Thank you Big Data (OSBD) Workshop Organizers Project Jupyter Team Min Ragan-Kelly

Slide 58

Slide 58 text

@WillingCarol Attributions 58 References to published research, projects, and drawings (and marked on slides) [3] Statistics: https://fivethirtyeight.com/features/which-city-has-the-most-unpredictable-weather/ [5,9] ECMWF [6] Copyright: 2019 European Union, contains modified Copernicus Sentinel data 2019, processed by EUMETSAT [7] Copyright contains modified Copernicus Sentinel data (2019), processed by ESA, CC BY-SA 3.0 IGO [23] Adam Rule et al. [38-43] Juliette Taka [45] Pangeo [46] Lindsey Heagy [47] Canadian Open Neuroscience Platform Photos [3, 4, 57] Source: Carol Willing and Linnea Willing [8] Twitter [10] Getty Images [29-31] nteract and Netflix