Scaling Reproducible Research with Jupyter

C8eedb2bca5728f0f73294b5b5a0222e?s=47 Carol Willing
October 15, 2019

Scaling Reproducible Research with Jupyter

Keynote at ECMWF Workshop on Building Reproducible Workflows for Earth Sciences. Reading, UK. #REPWORK19 Looking through the lens of weather forecasting, the keynote discusses tools, processes, and communication that help scale reproducible research. Examples from Project Jupyter include Jupyter Notebook, JupyterLab, JupyterHub, BinderHub, and Binder service. Additional examples from the nteract project, Python, Dagster, and Pangeo.

C8eedb2bca5728f0f73294b5b5a0222e?s=128

Carol Willing

October 15, 2019
Tweet

Transcript

  1. @WillingCarol Scaling Reproducible Research Workshop: Building reproducible workflows for earth

    sciences ECMWF October 15, 2019 1 Carol Willing @WillingCarol
  2. @WillingCarol 2 San Diego, CA

  3. @WillingCarol 3

  4. @WillingCarol 4

  5. @WillingCarol 5

  6. @WillingCarol 6 Tokyo

  7. @WillingCarol –Pearl S. Buck The typhoon came out of the

    sea first as a deep hollow roar. 7
  8. @WillingCarol 8

  9. @WillingCarol 9

  10. @WillingCarol 10

  11. @WillingCarol –Pearl S. Buck I was surrounded by the madness,

    the unreason, of uncontrolled, undisciplined energy. 11
  12. Copyright: 2019 European Union, contains modified Copernicus Sentinel data 2019,

    processed by EUMETSAT Super Typhoon Hagibis View of Super Typhoon Hagibis south-west of Japan, as captured by the Copernicus Sentinel-3 satellite on 08 October at 00:16 UTC.
  13. Title Typhoon Hagibis Released 10/10/2019 4:45 pm Copyright contains modified

    Copernicus Sentinel data (2019), processed by ESA, CC BY-SA 3.0 IGO
  14. Source:Twitter

  15. A sign is partially submerged as the Tama River floods

    during Typhoon Hagibis. Source:Getty Images
  16. @WillingCarol 16

  17. @WillingCarol 17

  18. @WillingCarol 18

  19. @WillingCarol Lives depend on 19

  20. @WillingCarol scaling reproducible research 20

  21. @WillingCarol Tools Processes Communication 21

  22. @WillingCarol 22

  23. Jupyter Notebook A Jupyter Notebook document with a visualization of

    measles data.
  24. @WillingCarol Research 24 Jupyter Citations Number 0 1000 2000 3000

    4000 2015 2016 2017 2018 2019 Projected
  25. 
 Millions of Notebooks https://github.com/trending/jupyter-notebook Over 5 million on GitHub

  26. @WillingCarol 26 ‣ Growth ‣ ACM Award ‣ Industry adoption

    ‣ Creative uses ‣ Open Source Book
  27. @WillingCarol JupyterLab 27

  28. 28 jupyter.org demo

  29. 29 jupyter.org demo

  30. @WillingCarol 30 https://github.com/data-exp-lab/rust-yt-tools/ npm package @data-exp-lab/yt-tools Irber Junior LC. Oxidizing

    Python: writing extensions in Rust [version 1; not peer reviewed]. F1000Research 2018, 7(ISCB Comm J):955 (poster) (https://doi.org/ 10.7490/f1000research.1115726.1) https://github.com/munkm/widgyts yt and jupyter widgets
  31. @WillingCarol 31 https://towardsdatascience.com/multivolume- rendering-in-jupyter-with-ipyvolume-cross- language-3d-visualization-64389047634a ipyvolume

  32. @WillingCarol Healthy Best Practices 32

  33. @WillingCarol 33 Ten Simple Rules for Reproducible Research in Jupyter

    Notebooks Adam Rule et al. https://github.com/jupyter-guide/ten-rules-jupyter https://github.com/jupyter-guide/jupyter-guide
  34. @WillingCarol Keep up with changes 34

  35. @WillingCarol Proceed cautiously with pseudo-open projects 35

  36. @WillingCarol Ask why 36

  37. @WillingCarol Tools Processes Communication 37

  38. zero-to-jupyterhub.readthedocs.io

  39. @WillingCarol 39 Papermill Parameterize and Run

  40. @WillingCarol 40 Data at scale - Netflix https://medium.com/netflix-techblog/notebook-innovation-591ee3221233 nteract Papermill

    Scrapbook Bookstore Commuter
  41. @WillingCarol 41 https://medium.com/dagster-io/dagster-0-6-0-impossible-princess-898b459375e0 Pipelines

  42. @WillingCarol Create a Reproducibility Pipeline 42

  43. @WillingCarol Decouple steps for flexibility 43

  44. @WillingCarol Plan Execute Change 44 https://jupyterhub-team-compass.readthedocs.io https://github.com/jupyterhub/team-compass

  45. @WillingCarol Tools Processes Communication 45

  46. @WillingCarol Notebooks to web 46 https://blog.jupyter.org/and- voil%C3%A0-f6a2c08a4a93

  47. @WillingCarol 47 Binder mybinder.org Binder 2.0 blog post elifesciences: Share

    your interactive research environment Nature article about Binder
  48. 48 Juliette Taka

  49. 49 Juliette Taka

  50. 50 Juliette Taka

  51. 51 Juliette Taka

  52. 52 Juliette Taka

  53. 53 Juliette Taka

  54. @WillingCarol Binder 54

  55. @WillingCarol 55 Binder mybinder.org

  56. @WillingCarol 56 From a phone in the park!

  57. @WillingCarol Pangeo 57 https://pangeo.io

  58. @WillingCarol 58

  59. @WillingCarol 59 https://simexp.github.io/vcog_hps_ad_book/intro.html Jupyter Book Binder Jupyter pandas scipy scikit

    learn matplotlib numpy seaborn Canadian Open Neuroscience Platform
  60. @WillingCarol Build Communities 60

  61. jupyter.org

  62. @WillingCarol Leverage solutions across disciplines 62

  63. @WillingCarol Share binders. Foster scientific research. 63

  64. @WillingCarol Tools Processes Communication 64

  65. @WillingCarol Why strive for reproducible research? 65

  66. @WillingCarol Reproducible research improves prediction 66

  67. @WillingCarol prediction = impact 67

  68. @WillingCarol 68 Scaling reproducible research improves science and our world

  69. None
  70. @WillingCarol 70 Thank you ECMWF Workshop Organizers Claudia Vitolo Project

    Jupyter Team Min Ragan-Kelly
  71. @WillingCarol Attributions 71 References to published research, projects, and drawings

    (and marked on slides) [2] Statistics: https://fivethirtyeight.com/features/which-city-has-the-most-unpredictable-weather/ [7, 11] A Bridge for Passing, Pearl S. Buck [8, 9, 18] ECMWF [12] Copyright: 2019 European Union, contains modified Copernicus Sentinel data 2019, processed by EUMETSAT [13] Copyright contains modified Copernicus Sentinel data (2019), processed by ESA, CC BY-SA 3.0 IGO [30] Madicken Munk [31] Maarten Breddels [33] Adam Rule et al. [46] Quantstack - Voila [48-53] Juliette Taka [57] Pangeo [58] Lindsey Heagy [59] Canadian Open Neuroscience Platform Photos [2-6, 10, 16-17, 69, 70] Source: Carol Willing and Linnea Willing [14] Twitter [15] Getty Images [55, 56] Kirstie Whitaker [23-29, 38, 44, 47, 54, 61] Project Jupyter [39-40] nteract and Netflix [41] Nick Shrock, Dagster
  72. @WillingCarol 72