Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Reproducible Research with Jupyter

Scaling Reproducible Research with Jupyter

Keynote at ECMWF Workshop on Building Reproducible Workflows for Earth Sciences. Reading, UK. #REPWORK19 Looking through the lens of weather forecasting, the keynote discusses tools, processes, and communication that help scale reproducible research. Examples from Project Jupyter include Jupyter Notebook, JupyterLab, JupyterHub, BinderHub, and Binder service. Additional examples from the nteract project, Python, Dagster, and Pangeo.

Carol Willing
PRO

October 15, 2019
Tweet

More Decks by Carol Willing

Other Decks in Science

Transcript

  1. @WillingCarol
    Scaling
    Reproducible Research
    Workshop: Building reproducible
    workflows for earth sciences
    ECMWF
    October 15, 2019
    1
    Carol Willing
    @WillingCarol

    View Slide

  2. @WillingCarol 2
    San Diego, CA

    View Slide

  3. @WillingCarol 3

    View Slide

  4. @WillingCarol 4

    View Slide

  5. @WillingCarol 5

    View Slide

  6. @WillingCarol 6
    Tokyo

    View Slide

  7. @WillingCarol
    –Pearl S. Buck
    The typhoon came out of the sea
    first as a deep hollow roar.
    7

    View Slide

  8. @WillingCarol 8

    View Slide

  9. @WillingCarol 9

    View Slide

  10. @WillingCarol 10

    View Slide

  11. @WillingCarol
    –Pearl S. Buck
    I was surrounded by the madness,
    the unreason, of uncontrolled,
    undisciplined energy.
    11

    View Slide

  12. Copyright: 2019 European
    Union, contains modified
    Copernicus Sentinel data
    2019, processed by
    EUMETSAT
    Super Typhoon
    Hagibis
    View of Super Typhoon
    Hagibis south-west of
    Japan, as captured by
    the Copernicus
    Sentinel-3 satellite on
    08 October at 00:16
    UTC.

    View Slide

  13. Title Typhoon Hagibis
    Released 10/10/2019 4:45 pm
    Copyright contains modified Copernicus
    Sentinel data (2019), processed by ESA,
    CC BY-SA 3.0 IGO

    View Slide

  14. Source:Twitter

    View Slide

  15. A sign is partially submerged as the Tama River floods during Typhoon Hagibis. Source:Getty Images

    View Slide

  16. @WillingCarol 16

    View Slide

  17. @WillingCarol 17

    View Slide

  18. @WillingCarol 18

    View Slide

  19. @WillingCarol
    Lives depend on
    19

    View Slide

  20. @WillingCarol
    scaling reproducible
    research
    20

    View Slide

  21. @WillingCarol
    Tools
    Processes
    Communication
    21

    View Slide

  22. @WillingCarol 22

    View Slide

  23. Jupyter
    Notebook
    A Jupyter Notebook document with a visualization of measles data.

    View Slide

  24. @WillingCarol
    Research
    24
    Jupyter Citations
    Number
    0
    1000
    2000
    3000
    4000
    2015 2016 2017 2018 2019 Projected

    View Slide


  25. Millions of
    Notebooks
    https://github.com/trending/jupyter-notebook
    Over 5 million

    on GitHub

    View Slide

  26. @WillingCarol 26
    ‣ Growth
    ‣ ACM Award
    ‣ Industry adoption
    ‣ Creative uses
    ‣ Open Source Book

    View Slide

  27. @WillingCarol
    JupyterLab
    27

    View Slide

  28. 28
    jupyter.org demo

    View Slide

  29. 29
    jupyter.org demo

    View Slide

  30. @WillingCarol 30
    https://github.com/data-exp-lab/rust-yt-tools/
    npm package @data-exp-lab/yt-tools
    Irber Junior LC. Oxidizing Python: writing
    extensions in Rust [version 1; not peer
    reviewed]. F1000Research 2018, 7(ISCB
    Comm J):955 (poster) (https://doi.org/
    10.7490/f1000research.1115726.1)
    https://github.com/munkm/widgyts
    yt and jupyter
    widgets

    View Slide

  31. @WillingCarol 31
    https://towardsdatascience.com/multivolume-
    rendering-in-jupyter-with-ipyvolume-cross-
    language-3d-visualization-64389047634a
    ipyvolume

    View Slide

  32. @WillingCarol
    Healthy Best Practices
    32

    View Slide

  33. @WillingCarol 33
    Ten Simple Rules for
    Reproducible Research in
    Jupyter Notebooks
    Adam Rule et al.
    https://github.com/jupyter-guide/ten-rules-jupyter
    https://github.com/jupyter-guide/jupyter-guide

    View Slide

  34. @WillingCarol
    Keep up with changes
    34

    View Slide

  35. @WillingCarol
    Proceed cautiously
    with pseudo-open
    projects
    35

    View Slide

  36. @WillingCarol
    Ask why
    36

    View Slide

  37. @WillingCarol
    Tools
    Processes
    Communication
    37

    View Slide

  38. zero-to-jupyterhub.readthedocs.io

    View Slide

  39. @WillingCarol 39
    Papermill
    Parameterize and Run

    View Slide

  40. @WillingCarol 40
    Data at scale - Netflix
    https://medium.com/netflix-techblog/notebook-innovation-591ee3221233
    nteract
    Papermill
    Scrapbook
    Bookstore
    Commuter

    View Slide

  41. @WillingCarol 41
    https://medium.com/dagster-io/dagster-0-6-0-impossible-princess-898b459375e0
    Pipelines

    View Slide

  42. @WillingCarol
    Create a
    Reproducibility
    Pipeline
    42

    View Slide

  43. @WillingCarol
    Decouple steps for
    flexibility
    43

    View Slide

  44. @WillingCarol
    Plan
    Execute
    Change
    44
    https://jupyterhub-team-compass.readthedocs.io
    https://github.com/jupyterhub/team-compass

    View Slide

  45. @WillingCarol
    Tools
    Processes
    Communication
    45

    View Slide

  46. @WillingCarol
    Notebooks to web
    46
    https://blog.jupyter.org/and-
    voil%C3%A0-f6a2c08a4a93

    View Slide

  47. @WillingCarol 47
    Binder
    mybinder.org
    Binder 2.0 blog post
    elifesciences: Share
    your interactive
    research environment
    Nature article about
    Binder

    View Slide

  48. 48
    Juliette Taka

    View Slide

  49. 49
    Juliette Taka

    View Slide

  50. 50
    Juliette Taka

    View Slide

  51. 51
    Juliette Taka

    View Slide

  52. 52
    Juliette Taka

    View Slide

  53. 53
    Juliette Taka

    View Slide

  54. @WillingCarol
    Binder
    54

    View Slide

  55. @WillingCarol 55
    Binder
    mybinder.org

    View Slide

  56. @WillingCarol 56
    From a
    phone in
    the park!

    View Slide

  57. @WillingCarol
    Pangeo
    57
    https://pangeo.io

    View Slide

  58. @WillingCarol 58

    View Slide

  59. @WillingCarol 59
    https://simexp.github.io/vcog_hps_ad_book/intro.html
    Jupyter Book
    Binder
    Jupyter
    pandas
    scipy
    scikit learn
    matplotlib
    numpy
    seaborn
    Canadian Open Neuroscience Platform

    View Slide

  60. @WillingCarol
    Build Communities
    60

    View Slide

  61. jupyter.org

    View Slide

  62. @WillingCarol
    Leverage solutions
    across disciplines
    62

    View Slide

  63. @WillingCarol
    Share binders.
    Foster scientific
    research.
    63

    View Slide

  64. @WillingCarol
    Tools
    Processes
    Communication
    64

    View Slide

  65. @WillingCarol
    Why strive for
    reproducible research?
    65

    View Slide

  66. @WillingCarol
    Reproducible research
    improves prediction
    66

    View Slide

  67. @WillingCarol
    prediction =
    impact
    67

    View Slide

  68. @WillingCarol 68
    Scaling reproducible
    research improves science
    and our world

    View Slide

  69. View Slide

  70. @WillingCarol 70
    Thank you
    ECMWF Workshop Organizers
    Claudia Vitolo
    Project Jupyter Team
    Min Ragan-Kelly

    View Slide

  71. @WillingCarol
    Attributions
    71
    References to published research, projects, and drawings (and marked on slides)
    [2] Statistics: https://fivethirtyeight.com/features/which-city-has-the-most-unpredictable-weather/
    [7, 11] A Bridge for Passing, Pearl S. Buck
    [8, 9, 18] ECMWF
    [12] Copyright: 2019 European Union, contains modified Copernicus Sentinel data 2019, processed by EUMETSAT
    [13] Copyright contains modified Copernicus Sentinel data (2019), processed by ESA, CC BY-SA 3.0 IGO
    [30] Madicken Munk
    [31] Maarten Breddels
    [33] Adam Rule et al.
    [46] Quantstack - Voila
    [48-53] Juliette Taka
    [57] Pangeo
    [58] Lindsey Heagy
    [59] Canadian Open Neuroscience Platform
    Photos
    [2-6, 10, 16-17, 69, 70] Source: Carol Willing and Linnea Willing
    [14] Twitter
    [15] Getty Images
    [55, 56] Kirstie Whitaker
    [23-29, 38, 44, 47, 54, 61] Project Jupyter
    [39-40] nteract and Netflix
    [41] Nick Shrock, Dagster

    View Slide

  72. @WillingCarol 72

    View Slide