Scaling Reproducible Research with Jupyter

Scaling Reproducible Research with Jupyter

Keynote delivered on 12-09-2019 at the 2019 IEEE Big Data Conference - 4th Workshop on Open Science in Big Data (OSBD).

Jupyter Notebooks have taken the scientific and open data world by storm the past five years. Being able to tell a computational narrative that combines prose, code, media, and rich visualizations have increased a researcher’s ability to collaborate with others, share research in a reproducible way, and educate others in their scientific discipline and beyond.

A suite of tools, processes that scale, and modern ways to communicate openly about scientific research have grown rapidly within Project Jupyter’s open source community. Beyond the Jupyter Notebook, open source projects, including JupyterLab, JupyterHub, Binder, and nteract’s Papermill, offer new pipelines and services to allow open research to scale and impact others on a global scale.

C8eedb2bca5728f0f73294b5b5a0222e?s=128

Carol Willing

December 09, 2019
Tweet

Transcript

  1. @WillingCarol Scaling Reproducible Research with Jupyter 4th Workshop on Open

    Science in Big Data (OSBD) IEEE Big Data, Los Angeles December 9, 2019 1 Carol Willing @WillingCarol 10.5281/zenodo.3567219.
  2. @WillingCarol 2 Using data responsibly to solve real world issues

    and improve human lives Reproducible Research
  3. @WillingCarol 3 San Diego, CA

  4. @WillingCarol 4 Tokyo

  5. @WillingCarol 5 Sunday Oct 6 Source: ECMWF

  6. Copyright: 2019 European Union, contains modified Copernicus Sentinel data 2019,

    processed by EUMETSAT Super Typhoon Hagibis View of Super Typhoon Hagibis south-west of Japan, as captured by the Copernicus Sentinel-3 satellite on 08 October at 00:16 UTC.
  7. Title Typhoon Hagibis Released 10/10/2019 4:45 pm Copyright contains modified

    Copernicus Sentinel data (2019), processed by ESA, CC BY-SA 3.0 IGO
  8. Source:Twitter

  9. @WillingCarol 9

  10. @WillingCarol 10

  11. A sign is partially submerged as the Tama River floods

    during Typhoon Hagibis. Source:Getty Images Source:Japan Times
  12. @WillingCarol Preparation Evacuation Safety 12

  13. @WillingCarol Lives depend on 13

  14. @WillingCarol scaling reproducible research 14

  15. @WillingCarol Tools Processes Communication 15

  16. @WillingCarol 16 jupyter.org

  17. @WillingCarol Research 17 Jupyter Citations Number 0 1000 2000 3000

    4000 2015 2016 2017 2018 2019 Projected
  18. 
 Millions of Notebooks https://github.com/trending/jupyter-notebook Over 5 million on GitHub

  19. @WillingCarol 19 ‣ Growth ‣ ACM Award ‣ Industry adoption

    ‣ Creative uses ‣ Open Source Book
  20. @WillingCarol 20 JupyterLab

  21. @WillingCarol 21 jupyter.org demo of JupyterLab

  22. @WillingCarol Healthy Best Practices 22

  23. @WillingCarol 23 Ten Simple Rules for Reproducible Research in Jupyter

    Notebooks Adam Rule et al. https://github.com/jupyter-guide/ten-rules-jupyter https://github.com/jupyter-guide/jupyter-guide
  24. @WillingCarol Keep up with changes 24 https://tinyletter.com/TrackingJupyter

  25. @WillingCarol Proceed cautiously with pseudo-open projects 25

  26. @WillingCarol Ask why 26

  27. @WillingCarol Tools Processes Communication 27

  28. A pictorial representation of the different tools constituting BinderHub. This

    image was created by Scriberia for The Turing Way community and is used under a CC-BY licence. Zenodo record. https://blog.jupyter.org/diving-into- leadership-to-build-push-button-code- df2a075c9914 zero-to-jupyterhub.readthedocs.io
  29. @WillingCarol 29 nteract Papermill Scrapbook Bookstore Commuter Production data at

    scale 29 https://medium.com/netflix-techblog/notebook-innovation-591ee3221233 Bookstore
  30. @WillingCarol 30 Papermill - parameterize / run Scrapbook - recording

    / reading Bookstore - store notebooks Commuter - share notebooks Production data at scale 30
  31. @WillingCarol 31 Papermill Parameterize and Run

  32. @WillingCarol Create a Reproducibility Pipeline 32

  33. @WillingCarol Decouple steps for flexibility 33

  34. @WillingCarol Plan Execute Change 34 https://jupyterhub-team-compass.readthedocs.io https://github.com/jupyterhub/team-compass

  35. @WillingCarol Tools Processes Communication 35

  36. @WillingCarol 36

  37. @WillingCarol 37 Deploy your own BinderHub mybinder.org Binder 2.0 blog

    post elifesciences: Share your interactive research environment Nature article about Binder
  38. 38 Juliette Taka

  39. 39 Juliette Taka

  40. 40 Juliette Taka

  41. 41 Juliette Taka

  42. 42 Juliette Taka

  43. 43 Juliette Taka

  44. @WillingCarol Binder 44

  45. @WillingCarol Pangeo 45 https://pangeo.io

  46. @WillingCarol 46

  47. @WillingCarol 47 https://simexp.github.io/vcog_hps_ad_book/intro.html Jupyter Book Binder Jupyter pandas scipy scikit

    learn matplotlib numpy seaborn Canadian Open Neuroscience Platform
  48. @WillingCarol Build Communities 48

  49. jupyter.org

  50. @WillingCarol Leverage solutions across disciplines 50

  51. @WillingCarol Share binders. Foster scientific research. 51

  52. @WillingCarol Tools Processes Communication 52

  53. @WillingCarol Why strive for reproducible research? 53

  54. @WillingCarol Reproducible research improves prediction 54

  55. @WillingCarol prediction = impact 55

  56. @WillingCarol 56 Scaling reproducible research improves science and our world

  57. @WillingCarol 57 Thank you Big Data (OSBD) Workshop Organizers Project

    Jupyter Team Min Ragan-Kelly
  58. @WillingCarol Attributions 58 References to published research, projects, and drawings

    (and marked on slides) [3] Statistics: https://fivethirtyeight.com/features/which-city-has-the-most-unpredictable-weather/ [5,9] ECMWF [6] Copyright: 2019 European Union, contains modified Copernicus Sentinel data 2019, processed by EUMETSAT [7] Copyright contains modified Copernicus Sentinel data (2019), processed by ESA, CC BY-SA 3.0 IGO [23] Adam Rule et al. [38-43] Juliette Taka [45] Pangeo [46] Lindsey Heagy [47] Canadian Open Neuroscience Platform Photos [3, 4, 57] Source: Carol Willing and Linnea Willing [8] Twitter [10] Getty Images [29-31] nteract and Netflix