Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Reproducible Research with Jupyter

Scaling Reproducible Research with Jupyter

Keynote delivered on 12-09-2019 at the 2019 IEEE Big Data Conference - 4th Workshop on Open Science in Big Data (OSBD).

Jupyter Notebooks have taken the scientific and open data world by storm the past five years. Being able to tell a computational narrative that combines prose, code, media, and rich visualizations have increased a researcher’s ability to collaborate with others, share research in a reproducible way, and educate others in their scientific discipline and beyond.

A suite of tools, processes that scale, and modern ways to communicate openly about scientific research have grown rapidly within Project Jupyter’s open source community. Beyond the Jupyter Notebook, open source projects, including JupyterLab, JupyterHub, Binder, and nteract’s Papermill, offer new pipelines and services to allow open research to scale and impact others on a global scale.

Carol Willing
PRO

December 09, 2019
Tweet

More Decks by Carol Willing

Other Decks in Technology

Transcript

  1. @WillingCarol
    Scaling Reproducible Research
    with Jupyter
    4th Workshop on Open Science in Big Data (OSBD)
    IEEE Big Data, Los Angeles
    December 9, 2019
    1
    Carol Willing
    @WillingCarol
    10.5281/zenodo.3567219.

    View Slide

  2. @WillingCarol 2
    Using data responsibly to
    solve real world issues and
    improve human lives
    Reproducible Research

    View Slide

  3. @WillingCarol 3
    San Diego, CA

    View Slide

  4. @WillingCarol 4
    Tokyo

    View Slide

  5. @WillingCarol 5
    Sunday Oct 6
    Source: ECMWF

    View Slide

  6. Copyright: 2019 European
    Union, contains modified
    Copernicus Sentinel data
    2019, processed by
    EUMETSAT
    Super Typhoon
    Hagibis
    View of Super Typhoon
    Hagibis south-west of
    Japan, as captured by
    the Copernicus
    Sentinel-3 satellite on
    08 October at 00:16
    UTC.

    View Slide

  7. Title Typhoon Hagibis
    Released 10/10/2019 4:45 pm
    Copyright contains modified Copernicus
    Sentinel data (2019), processed by ESA,
    CC BY-SA 3.0 IGO

    View Slide

  8. Source:Twitter

    View Slide

  9. @WillingCarol 9

    View Slide

  10. @WillingCarol 10

    View Slide

  11. A sign is partially submerged as the Tama River floods during Typhoon Hagibis.
    Source:Getty Images
    Source:Japan Times

    View Slide

  12. @WillingCarol
    Preparation
    Evacuation
    Safety
    12

    View Slide

  13. @WillingCarol
    Lives depend on
    13

    View Slide

  14. @WillingCarol
    scaling reproducible
    research
    14

    View Slide

  15. @WillingCarol
    Tools
    Processes
    Communication
    15

    View Slide

  16. @WillingCarol 16
    jupyter.org

    View Slide

  17. @WillingCarol
    Research
    17
    Jupyter Citations
    Number
    0
    1000
    2000
    3000
    4000
    2015 2016 2017 2018 2019 Projected

    View Slide


  18. Millions of
    Notebooks
    https://github.com/trending/jupyter-notebook
    Over 5 million

    on GitHub

    View Slide

  19. @WillingCarol 19
    ‣ Growth
    ‣ ACM Award
    ‣ Industry adoption
    ‣ Creative uses
    ‣ Open Source Book

    View Slide

  20. @WillingCarol 20
    JupyterLab

    View Slide

  21. @WillingCarol 21
    jupyter.org demo of JupyterLab

    View Slide

  22. @WillingCarol
    Healthy Best Practices
    22

    View Slide

  23. @WillingCarol 23
    Ten Simple Rules for
    Reproducible Research in
    Jupyter Notebooks
    Adam Rule et al.
    https://github.com/jupyter-guide/ten-rules-jupyter
    https://github.com/jupyter-guide/jupyter-guide

    View Slide

  24. @WillingCarol
    Keep up with changes
    24
    https://tinyletter.com/TrackingJupyter

    View Slide

  25. @WillingCarol
    Proceed cautiously
    with pseudo-open
    projects
    25

    View Slide

  26. @WillingCarol
    Ask why
    26

    View Slide

  27. @WillingCarol
    Tools
    Processes
    Communication
    27

    View Slide

  28. A pictorial representation of the different tools constituting BinderHub. This image was created by Scriberia for The Turing Way
    community and is used under a CC-BY licence. Zenodo record.
    https://blog.jupyter.org/diving-into-
    leadership-to-build-push-button-code-
    df2a075c9914
    zero-to-jupyterhub.readthedocs.io

    View Slide

  29. @WillingCarol 29
    nteract
    Papermill
    Scrapbook
    Bookstore
    Commuter
    Production data at scale
    29
    https://medium.com/netflix-techblog/notebook-innovation-591ee3221233
    Bookstore

    View Slide

  30. @WillingCarol 30
    Papermill - parameterize / run
    Scrapbook - recording / reading
    Bookstore - store notebooks
    Commuter - share notebooks
    Production data at scale
    30

    View Slide

  31. @WillingCarol 31
    Papermill
    Parameterize and Run

    View Slide

  32. @WillingCarol
    Create a
    Reproducibility
    Pipeline
    32

    View Slide

  33. @WillingCarol
    Decouple steps for
    flexibility
    33

    View Slide

  34. @WillingCarol
    Plan
    Execute
    Change
    34
    https://jupyterhub-team-compass.readthedocs.io
    https://github.com/jupyterhub/team-compass

    View Slide

  35. @WillingCarol
    Tools
    Processes
    Communication
    35

    View Slide

  36. @WillingCarol 36

    View Slide

  37. @WillingCarol 37
    Deploy your own BinderHub
    mybinder.org
    Binder 2.0 blog post
    elifesciences: Share
    your interactive
    research environment
    Nature article about
    Binder

    View Slide

  38. 38
    Juliette Taka

    View Slide

  39. 39
    Juliette Taka

    View Slide

  40. 40
    Juliette Taka

    View Slide

  41. 41
    Juliette Taka

    View Slide

  42. 42
    Juliette Taka

    View Slide

  43. 43
    Juliette Taka

    View Slide

  44. @WillingCarol
    Binder
    44

    View Slide

  45. @WillingCarol
    Pangeo
    45
    https://pangeo.io

    View Slide

  46. @WillingCarol 46

    View Slide

  47. @WillingCarol 47
    https://simexp.github.io/vcog_hps_ad_book/intro.html
    Jupyter Book
    Binder
    Jupyter
    pandas
    scipy
    scikit learn
    matplotlib
    numpy
    seaborn
    Canadian Open Neuroscience Platform

    View Slide

  48. @WillingCarol
    Build Communities
    48

    View Slide

  49. jupyter.org

    View Slide

  50. @WillingCarol
    Leverage solutions
    across disciplines
    50

    View Slide

  51. @WillingCarol
    Share binders.
    Foster scientific
    research.
    51

    View Slide

  52. @WillingCarol
    Tools
    Processes
    Communication
    52

    View Slide

  53. @WillingCarol
    Why strive for
    reproducible research?
    53

    View Slide

  54. @WillingCarol
    Reproducible research
    improves prediction
    54

    View Slide

  55. @WillingCarol
    prediction =
    impact
    55

    View Slide

  56. @WillingCarol 56
    Scaling reproducible
    research improves science
    and our world

    View Slide

  57. @WillingCarol 57
    Thank you
    Big Data (OSBD) Workshop Organizers
    Project Jupyter Team
    Min Ragan-Kelly

    View Slide

  58. @WillingCarol
    Attributions
    58
    References to published research, projects, and drawings (and marked on slides)
    [3] Statistics: https://fivethirtyeight.com/features/which-city-has-the-most-unpredictable-weather/
    [5,9] ECMWF
    [6] Copyright: 2019 European Union, contains modified Copernicus Sentinel data 2019, processed by EUMETSAT
    [7] Copyright contains modified Copernicus Sentinel data (2019), processed by ESA, CC BY-SA 3.0 IGO
    [23] Adam Rule et al.
    [38-43] Juliette Taka
    [45] Pangeo
    [46] Lindsey Heagy
    [47] Canadian Open Neuroscience Platform
    Photos
    [3, 4, 57] Source: Carol Willing and Linnea Willing
    [8] Twitter
    [10] Getty Images
    [29-31] nteract and Netflix

    View Slide