Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Jupyter in Production - Rev 3

Jupyter in Production - Rev 3

Presented at the Rev 3 MLOps Conference in New York City on May 5, 2022.

Patrick Harrison

May 05, 2022
Tweet

More Decks by Patrick Harrison

Other Decks in Programming

Transcript

  1. Jupyter in Production

    View full-size slide

  2. whois
    Patrick Harrison

    View full-size slide

  3. whois
    Patrick Harrison
    Data Theoretic

    View full-size slide

  4. whois
    Patrick Harrison
    Data Theoretic
    Previously: Led AI Engineering at a major
    fi
    nancial data company

    View full-size slide

  5. Source: https://ipython.org/ipython-doc/rel-0.12/whatsnew/version0.12.html

    View full-size slide

  6. Jupyter Notebooks just turned ten years old
    Source: https://ipython.org/ipython-doc/rel-0.12/whatsnew/version0.12.html

    View full-size slide

  7. Jupyter Notebooks just turned ten years old
    The original IPython Notebook was
    fi
    rst released on December 19, 2011
    Source: https://ipython.org/ipython-doc/rel-0.12/whatsnew/version0.12.html

    View full-size slide

  8. Source: https://github.com/parente/nbestimate/blob/master/estimate.ipynb
    Public Jupyter Notebooks on GitHub

    View full-size slide

  9. Source: https://github.com/parente/nbestimate/blob/master/estimate.ipynb
    Public Jupyter Notebooks on GitHub
    ≈0

    View full-size slide

  10. Source: https://github.com/parente/nbestimate/blob/master/estimate.ipynb
    Public Jupyter Notebooks on GitHub
    ≈0
    ≈10,000,000

    View full-size slide

  11. 8,000+
    new public Jupyter Notebooks posted on GitHub every day in 2022, on average
    Source: https://github.com/parente/nbestimate/blob/master/ipynb_counts.csv

    View full-size slide

  12. Jupyter Notebooks have been used to do some amazing things

    View full-size slide

  13. Source: https://blog.jupyter.org/congratulations-to-the-ligo-and-virgo-collaborations-from-project-jupyter-5923247be019
    On behalf of the entire Project Jupyter team, we’d like to say congratulations to
    Rainer Weiss, Barry C. Barish, Kip S. Thorne and the rest of the LIGO and VIRGO
    teams for the Nobel Prize in Physics 2017. Since 2015, the LIGO and VIRGO
    Collaborations have observed multiple instances of gravitational waves due to
    colliding black holes (and more recently neutron stars). These observations
    represent decades of work and confirm what Einstein had theorized a hundred
    years ago.


    ...


    To communicate to the broader community, the LIGO/VIRGO Collaboration
    has created tutorials with Jupyter Notebooks that describe how to use LIGO/
    VIRGO data and reproduce analyses related to their academic publications.

    View full-size slide

  14. Source: https://blog.jupyter.org/jupyter-receives-the-acm-software-system-award-d433b0dfe3a2
    It is our pleasure to announce that Project Jupyter has been awarded the 2017
    ACM Software System Award, a significant honor for the project. We are
    humbled to join an illustrious list of projects that contains major highlights of
    computing history, including Unix, TeX, S (R’s predecessor), the Web, Mosaic,
    Java, INGRES (modern databases) and more.

    View full-size slide

  15. Jupyter Notebooks have some compelling strengths

    View full-size slide

  16. Interactive, exploratory programming with immediate feedback
    #1

    View full-size slide

  17. Build a computational narrative bringing together


    code, results, explanatory prose, plots, images, widgets,
    and more in a single, human-friendly document
    #2

    View full-size slide

  18. Lower barriers to entry

    View full-size slide

  19. ...many more people and roles can access, use, and
    collaborate on programming and data analysis in their work
    Lower barriers to entry

    View full-size slide

  20. Increased productivity

    View full-size slide

  21. Increased productivity
    ...for programmers of all skill levels

    View full-size slide

  22. "We’ve found that we’re 2x-3x more productive using
    [notebook-based development] than using traditional
    programming tools...
    Source: https://www.fast.ai/2019/12/02/nbdev/

    View full-size slide

  23. "We’ve found that we’re 2x-3x more productive using
    [notebook-based development] than using traditional
    programming tools...
    ...this is a big surprise, since I have coded nearly every day
    for over 30 years, and in that time have tried dozens of tools,
    libraries, and systems for building programs."
    Source: https://www.fast.ai/2019/12/02/nbdev/

    View full-size slide

  24. "We’ve found that we’re 2x-3x more productive using
    [notebook-based development] than using traditional
    programming tools...
    ...this is a big surprise, since I have coded nearly every day
    for over 30 years, and in that time have tried dozens of tools,
    libraries, and systems for building programs."
    Source: https://www.fast.ai/2019/12/02/nbdev/
    — Jeremy Howard, fast.ai

    View full-size slide

  25. Jupyter Notebooks have become


    an essential part of the data scientist's toolkit

    View full-size slide

  26. But, a story you've probably heard before...

    View full-size slide

  27. The magic words...

    View full-size slide

  28. "Let's put this in production"
    The magic words...

    View full-size slide

  29. "You can't use Jupyter Notebooks in production"

    View full-size slide

  30. "It's not supported."

    View full-size slide

  31. This is a pain to version control.

    View full-size slide

  32. This is a pain to version control.
    This is monolithic. How will we collaborate effectively?

    View full-size slide

  33. This is a pain to version control.
    This is monolithic. How will we collaborate effectively?
    How can we share and reuse this code?

    View full-size slide

  34. This is a pain to version control.
    This is monolithic. How will we collaborate effectively?
    How can we share and reuse this code?
    How do we apply our code quality standards?

    View full-size slide

  35. This is a pain to version control.
    This is monolithic. How will we collaborate effectively?
    How can we share and reuse this code?
    How do we apply our code quality standards?
    How do we test this code?

    View full-size slide

  36. This is a pain to version control.
    This is monolithic. How will we collaborate effectively?
    How can we share and reuse this code?
    How do we apply our code quality standards?
    How do we test this code?
    Will this work with our continuous integration system?

    View full-size slide

  37. This is a pain to version control.
    This is monolithic. How will we collaborate effectively?
    How can we share and reuse this code?
    How do we apply our code quality standards?
    How do we test this code?
    Will this work with our continuous integration system?
    How do we schedule and trigger automatic execution?

    View full-size slide

  38. This is a pain to version control.
    This is monolithic. How will we collaborate effectively?
    How can we share and reuse this code?
    How do we apply our code quality standards?
    How do we test this code?
    Will this work with our continuous integration system?
    How do we schedule and trigger automatic execution?
    Out-of-order cell execution!

    View full-size slide

  39. This is a pain to version control.
    This is monolithic. How will we collaborate effectively?
    How can we share and reuse this code?
    How do we apply our code quality standards?
    How do we test this code?
    Will this work with our continuous integration system?
    How do we schedule and trigger automatic execution?
    Out-of-order cell execution!
    ...

    View full-size slide

  40. OK, how should we get this work into production?

    View full-size slide

  41. OK, how should we get this work into production?
    “It looks like there's a lot going on in your notebook…"

    View full-size slide

  42. Your notebook has reusable code...
    How should we get this work into production?

    View full-size slide

  43. Your notebook has reusable code...
    ... you're going to need to reimplement this code as proper
    software libraries,
    How should we get this work into production?

    View full-size slide

  44. Your notebook has reusable code...
    ... you're going to need to reimplement this code as proper
    software libraries,
    ... subject to our company-wide software engineering
    standards,
    How should we get this work into production?

    View full-size slide

  45. Your notebook has reusable code...
    ... you're going to need to reimplement this code as proper
    software libraries,
    ... subject to our company-wide software engineering
    standards,
    ... with reimplemented tests using our company's preferred
    testing framework,
    How should we get this work into production?

    View full-size slide

  46. Your notebook has reusable code...
    ... you're going to need to reimplement this code as proper
    software libraries,
    ... subject to our company-wide software engineering
    standards,
    ... with reimplemented tests using our company's preferred
    testing framework,
    ... using our preferred enterprise continuous integration system,
    How should we get this work into production?

    View full-size slide

  47. Your notebook has reusable code...
    ... you're going to need to reimplement this code as proper
    software libraries,
    ... subject to our company-wide software engineering
    standards,
    ... with reimplemented tests using our company's preferred
    testing framework,
    ... using our preferred enterprise continuous integration system,
    ... and deploy to our preferred enterprise artifact repository.
    How should we get this work into production?

    View full-size slide

  48. Your notebook is accessing and transforming data...
    How should we get this work into production?

    View full-size slide

  49. Your notebook is accessing and transforming data...
    ... you're going to need to reimplement this logic as data
    pipelines in our preferred enterprise data pipeline
    framework,
    How should we get this work into production?

    View full-size slide

  50. Your notebook is accessing and transforming data...
    ... you're going to need to reimplement this logic as data
    pipelines in our preferred enterprise data pipeline
    framework,
    ... which has its own engineering practices and conventions,
    How should we get this work into production?

    View full-size slide

  51. Your notebook is accessing and transforming data...
    ... you're going to need to reimplement this logic as data
    pipelines in our preferred enterprise data pipeline
    framework,
    ... which has its own engineering practices and conventions,
    ... and may not even use the same programming language.
    How should we get this work into production?

    View full-size slide

  52. Your notebook generates predictions...
    How should we get this work into production?

    View full-size slide

  53. Your notebook generates predictions...
    ... you're going to need to reimplement the model as a
    web service,
    How should we get this work into production?

    View full-size slide

  54. Your notebook generates predictions...
    ... you're going to need to reimplement the model as a
    web service,
    ... wrap it in a Docker container,
    How should we get this work into production?

    View full-size slide

  55. Your notebook generates predictions...
    ... you're going to need to reimplement the model as a
    web service,
    ... wrap it in a Docker container,
    ... store it in our preferred enterprise container registry,
    How should we get this work into production?

    View full-size slide

  56. Your notebook generates predictions...
    ... you're going to need to reimplement the model as a
    web service,
    ... wrap it in a Docker container,
    ... store it in our preferred enterprise container registry,
    ... and deploy it to our preferred enterprise container
    orchestration platform.
    How should we get this work into production?

    View full-size slide

  57. Your notebook presents results to end users...
    How should we get this work into production?

    View full-size slide

  58. Your notebook presents results to end users...
    ... you're going to need to reimplement these reports in
    our preferred enterprise business intelligence platform,
    How should we get this work into production?

    View full-size slide

  59. Your notebook presents results to end users...
    ... you're going to need to reimplement these reports in
    our preferred enterprise business intelligence platform,
    ... which has its own engineering practices and
    conventions,
    How should we get this work into production?

    View full-size slide

  60. Your notebook presents results to end users...
    ... you're going to need to reimplement these reports in
    our preferred enterprise business intelligence platform,
    ... which has its own engineering practices and
    conventions,
    ... and may not even use the same programming language.
    How should we get this work into production?

    View full-size slide

  61. So you're telling me that if we're going to get our
    work in production, either:

    View full-size slide

  62. So you're telling me that if we're going to get our
    work in production, either:
    1. Our data science teams have to be stacked with unicorns,

    View full-size slide

  63. So you're telling me that if we're going to get our
    work in production, either:
    1. Our data science teams have to be stacked with unicorns,
    or

    View full-size slide

  64. So you're telling me that if we're going to get our
    work in production, either:
    1. Our data science teams have to be stacked with unicorns,
    or
    2. We have to loop in a bunch of other teams and create
    dependencies between them

    View full-size slide

  65. My teams went through this process


    so many times we had a name for it

    View full-size slide

  66. de • notebook • i
    fi
    cation

    View full-size slide

  67. de • notebook • i
    fi
    cation
    The long, painful process of exploding a Jupyter
    Notebook that de
    fi
    nitely works into a constellation
    of disparate production artifacts that maybe don't

    View full-size slide

  68. ⚠ WARNING: De-notebook-i
    fi
    cation has been shown to have
    side effects including increased complexity, elongated timelines,
    unhappy stakeholders, frustrated data scientists, increased risk of
    project cancelation, and loss of data science team credibility.

    View full-size slide

  69. Additional problem:

    View full-size slide

  70. Additional problem:
    If Jupyter is only for demos and prototypes...

    View full-size slide

  71. Additional problem:
    If Jupyter is only for demos and prototypes...
    Why bother writing good code in notebooks?

    View full-size slide

  72. "Maybe you shouldn't use Jupyter in the
    fi
    rst place"

    View full-size slide

  73. "Maybe you shouldn't use Jupyter in the
    fi
    rst place"
    There has to be a better answer

    View full-size slide

  74. enter the
    Jupyter in Production
    ecosystem

    View full-size slide

  75. But
    fi
    rst... what does in production mean, anyway?

    View full-size slide

  76. For this talk, we'll focus on:
    What does in production mean, anyway?

    View full-size slide

  77. For this talk, we'll focus on:
    •Developing and distributing software libraries
    What does in production mean, anyway?

    View full-size slide

  78. For this talk, we'll focus on:
    •Developing and distributing software libraries
    •Building and running data pipelines
    What does in production mean, anyway?

    View full-size slide

  79. For this talk, we'll focus on:
    •Developing and distributing software libraries
    •Building and running data pipelines
    •Creating interactive reports and dashboards
    What does in production mean, anyway?

    View full-size slide

  80. For each of these tools, I'll try to answer...

    View full-size slide

  81. ... what is it?
    For each of these tools, I'll try to answer...

    View full-size slide

  82. ... what is it?
    ... what do I have to do to use it?
    For each of these tools, I'll try to answer...

    View full-size slide

  83. ... what is it?
    ... what do I have to do to use it?
    ... what's in it for me?
    For each of these tools, I'll try to answer...

    View full-size slide

  84. Developing and distributing software libraries

    View full-size slide

  85. nbdev
    •Initial Release: 2019


    •GitHub Stars: 3.2k 🌟


    •GitHub: https://github.com/fastai/nbdev/

    View full-size slide

  86. What is it?
    nbdev

    View full-size slide

  87. A collection of tools that let you use Jupyter Notebooks
    as the source code for Python software libraries
    nbdev

    View full-size slide

  88. What do I have to do to use it?
    nbdev

    View full-size slide

  89. Setup
    • pip install nbdev or conda install nbdev -c fastai
    nbdev

    View full-size slide

  90. Setup
    • pip install nbdev or conda install nbdev -c fastai
    • Initialize your git repository as an nbdev project: nbdev_new

    (Or, copy the of
    fi
    cial nbdev template repo on GitHub)
    nbdev

    View full-size slide

  91. Setup
    • pip install nbdev or conda install nbdev -c fastai
    • Initialize your git repository as an nbdev project: nbdev_new

    (Or, copy the of
    fi
    cial nbdev template repo on GitHub)
    • Install the nbdev git hooks: nbdev_install_git_hooks
    nbdev

    View full-size slide

  92. Setup
    • pip install nbdev or conda install nbdev -c fastai
    • Initialize your git repository as an nbdev project: nbdev_new

    (Or, copy the of
    fi
    cial nbdev template repo on GitHub)
    • Install the nbdev git hooks: nbdev_install_git_hooks
    • Enter some basic project information in settings.ini
    nbdev

    View full-size slide

  93. Basic Usage
    • Start with exploratory programming in Jupyter Notebooks, as usual
    nbdev

    View full-size slide

  94. Basic Usage
    • Start with exploratory programming in Jupyter Notebooks, as usual
    • As you go, notice when it would make sense to reuse or share bits of the
    code you write
    nbdev

    View full-size slide

  95. Basic Usage
    • Start with exploratory programming in Jupyter Notebooks, as usual
    • As you go, notice when it would make sense to reuse or share bits of the
    code you write
    • Reshape this code into functions and classes in a notebook
    nbdev

    View full-size slide

  96. Basic Usage
    • Start with exploratory programming in Jupyter Notebooks, as usual
    • As you go, notice when it would make sense to reuse or share bits of the
    code you write
    • Reshape this code into functions and classes in a notebook
    • Add the #export
    fl
    ag (code comment) at the start of your main code cells
    nbdev

    View full-size slide

  97. Basic Usage
    • Start with exploratory programming in Jupyter Notebooks, as usual
    • As you go, notice when it would make sense to reuse or share bits of the
    code you write
    • Reshape this code into functions and classes in a notebook
    • Add the #export
    fl
    ag (code comment) at the start of your main code cells
    • Next to your main code cells, add rich explanatory text, images, code
    usage examples, sample output, and assert statements
    nbdev

    View full-size slide

  98. Source: https://nbdev.fast.ai/example.html

    View full-size slide

  99. Source: https://nbdev.fast.ai/example.html

    View full-size slide

  100. Source: https://nbdev.fast.ai/example.html

    View full-size slide

  101. Source: https://nbdev.fast.ai/example.html

    View full-size slide

  102. Source: https://nbdev.fast.ai/example.html

    View full-size slide

  103. What's in it for me?
    nbdev

    View full-size slide

  104. Quite a bit, actually.
    nbdev

    View full-size slide

  105. Automatically export the code from your Jupyter
    Notebooks into a fully-functional Python package:
    nbdev
    nbdev_build_lib

    View full-size slide

  106. Source: https://nbdev.fast.ai/example.html

    View full-size slide

  107. Automatically publish new releases of your package
    to PyPI and conda:
    nbdev
    make release

    View full-size slide

  108. Automatically generate a rich documentation website
    for your package from your Jupyter Notebooks:
    nbdev
    nbdev_build_docs

    View full-size slide

  109. Source: https://nbdev.fast.ai/example.html

    View full-size slide

  110. Avoid common version control con
    fl
    icts and
    resolving them when they occur:
    nbdev
    nbdev_clean_nbs & nbdev_fix_merge

    View full-size slide

  111. Source: https://nbdev.fast.ai/merge.html

    View full-size slide

  112. Automatically run tests on your notebooks:
    nbdev
    nbdev_test_nbs

    View full-size slide

  113. Source: https://nbdev.fast.ai/example.html

    View full-size slide

  114. nbdev
    $ nbdev_test_nbs


    testing: card.ipynb


    testing: deck.ipynb


    All tests are passing!
    Source: https://nbdev.fast.ai/tutorial.html

    View full-size slide

  115. Continuous integration out-of-the-box


    with git hooks and GitHub Actions
    nbdev

    View full-size slide

  116. Conceptual shift
    nbdev

    View full-size slide

  117. With nbdev, your source code, tests, and documentation
    all live together in one place
    nbdev

    View full-size slide

  118. Source: https://nbdev.fast.ai/example.html
    Code

    View full-size slide

  119. Source: https://nbdev.fast.ai/example.html
    Code
    Docs
    Docs

    View full-size slide

  120. Source: https://nbdev.fast.ai/example.html
    Code
    Tests
    Docs
    Docs

    View full-size slide

  121. "The magic of nbdev is that it doesn’t actually change
    programming that much; you add a #export or #hide tag
    to your notebook cells once in a while, and you run
    nbdev_build_lib and nbdev_build_docs when you
    fi
    nish up your code.

    Source: https://www.overstory.com/blog/how-nbdev-helps-us-structure-our-data-science-work
    fl
    ow-in-jupyter-notebooks
    nbdev

    View full-size slide

  122. "The magic of nbdev is that it doesn’t actually change
    programming that much; you add a #export or #hide tag
    to your notebook cells once in a while, and you run
    nbdev_build_lib and nbdev_build_docs when you
    fi
    nish up your code.

    That’s it! There’s nothing new to learn, nothing to unlearn.
    It’s just notebooks."
    Source: https://www.overstory.com/blog/how-nbdev-helps-us-structure-our-data-science-work
    fl
    ow-in-jupyter-notebooks
    nbdev

    View full-size slide

  123. “[nbdev] incentives us to write clear code, use proper Git
    version control and document and test our codebase
    continuously... [while] preserving the bene
    fi
    ts of having
    interactive Jupyter notebooks in which it is easy to
    experiment."
    Source: https://www.overstory.com/blog/how-nbdev-helps-us-structure-our-data-science-work
    fl
    ow-in-jupyter-notebooks
    nbdev

    View full-size slide

  124. “[nbdev] incentives us to write clear code, use proper Git
    version control and document and test our codebase
    continuously... [while] preserving the bene
    fi
    ts of having
    interactive Jupyter notebooks in which it is easy to
    experiment."
    Source: https://www.overstory.com/blog/how-nbdev-helps-us-structure-our-data-science-work
    fl
    ow-in-jupyter-notebooks
    nbdev
    — Overstory

    View full-size slide

  125. Visually compare notebook versions

    View full-size slide

  126. nbdime and ReviewNB
    Visually compare notebook versions

    View full-size slide

  127. Source: https://nbdime.readthedocs.io

    View full-size slide

  128. Source: https://www.reviewnb.com/

    View full-size slide

  129. Run your favorite code quality tools on notebooks

    View full-size slide

  130. nbQA
    Run your favorite code quality tools on notebooks

    View full-size slide

  131. $ nbqa black my_notebook.ipynb


    reformatted my_notebook.ipynb


    All done! ✨ 🍰 ✨


    1 files reformatted.
    Source: https://nbqa.readthedocs.io/en/latest/examples.html
    nbQA

    View full-size slide

  132. Building and running data pipelines

    View full-size slide

  133. Source: https://docs.ploomber.io/en/latest/use-cases/ml.html

    View full-size slide

  134. “We’re currently in the process of migrating all 10,000 of
    the scheduled jobs running on the Net
    fl
    ix Data Platform to
    use notebook-based execution…

    Source: https://net
    fl
    ixtechblog.com/scheduling-notebooks-348e6c14cfd6

    View full-size slide

  135. “We’re currently in the process of migrating all 10,000 of
    the scheduled jobs running on the Net
    fl
    ix Data Platform to
    use notebook-based execution…

    When we’re done, more than 150,000 [pipeline executions]
    will be running through notebooks on our platform every
    single day.”
    Source: https://net
    fl
    ixtechblog.com/scheduling-notebooks-348e6c14cfd6

    View full-size slide

  136. “We’re currently in the process of migrating all 10,000 of
    the scheduled jobs running on the Net
    fl
    ix Data Platform to
    use notebook-based execution…

    When we’re done, more than 150,000 [pipeline executions]
    will be running through notebooks on our platform every
    single day.”
    Source: https://net
    fl
    ixtechblog.com/scheduling-notebooks-348e6c14cfd6
    — Net
    fl
    ix (2018)

    View full-size slide

  137. ploomber
    •Initial Release: 2020


    •GitHub Stars: 2.3k 🌟


    •GitHub: https://github.com/ploomber/ploomber

    View full-size slide

  138. What is it?
    ploomber

    View full-size slide

  139. A framework to build and execute data pipelines
    made out of Jupyter Notebooks
    ploomber

    View full-size slide

  140. What do I have to do to use it?
    ploomber

    View full-size slide

  141. Setup
    • pip install ploomber or

    conda install ploomber -c conda-forge
    ploomber

    View full-size slide

  142. Setup
    • pip install ploomber or

    conda install ploomber -c conda-forge
    • Initialize your git repository as a ploomber project:
    ploomber

    View full-size slide

  143. Setup
    • pip install ploomber or

    conda install ploomber -c conda-forge
    • Initialize your git repository as a ploomber project:
    • ploomber scaffold --empty
    ploomber

    View full-size slide

  144. Setup
    • pip install ploomber or

    conda install ploomber -c conda-forge
    • Initialize your git repository as a ploomber project:
    • ploomber scaffold --empty
    • Add information about your pipeline to pipeline.yaml as you go
    ploomber

    View full-size slide

  145. Basic Usage
    • Start with exploratory programming in Jupyter Notebooks, as usual
    ploomber

    View full-size slide

  146. Basic Usage
    • Start with exploratory programming in Jupyter Notebooks, as usual
    • As you go, notice when chunks of your code would make sense as
    modular "tasks" in a data transformation work
    fl
    ow
    ploomber

    View full-size slide

  147. Basic Usage
    • Start with exploratory programming in Jupyter Notebooks, as usual
    • As you go, notice when chunks of your code would make sense as
    modular "tasks" in a data transformation work
    fl
    ow
    • Move the code for each task into its own dedicated notebook
    ploomber

    View full-size slide

  148. Basic Usage
    • Start with exploratory programming in Jupyter Notebooks, as usual
    • As you go, notice when chunks of your code would make sense as
    modular "tasks" in a data transformation work
    fl
    ow
    • Move the code for each task into its own dedicated notebook
    • Next to your code cells, add rich explanatory text, images, example
    expected output, and data quality checks
    ploomber

    View full-size slide

  149. Basic Usage
    • Record information about your task notebooks in pipeline.yaml
    ploomber

    View full-size slide

  150. Basic Usage
    • Record information about your task notebooks in pipeline.yaml
    • Add a few variables to your task notebooks to de
    fi
    ne upstream
    dependencies
    ploomber

    View full-size slide

  151. Basic Usage
    • Record information about your task notebooks in pipeline.yaml
    • Add a few variables to your task notebooks to de
    fi
    ne upstream
    dependencies
    • Run your pipeline with ploomber build
    ploomber

    View full-size slide

  152. Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html

    View full-size slide

  153. Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html
    .ipynb

    View full-size slide

  154. Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html
    .ipynb .ipynb .ipynb

    View full-size slide

  155. pipeline.yaml
    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  156. pipeline.yaml
    tasks:
    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  157. pipeline.yaml
    tasks:
    # source is the code you want to execute

    - source: raw.ipynb
    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  158. pipeline.yaml
    tasks:
    # source is the code you want to execute

    - source: raw.ipynb
    # products are task's outputs

    product:
    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  159. pipeline.yaml
    tasks:
    # source is the code you want to execute

    - source: raw.ipynb
    # products are task's outputs

    product:
    # tasks generate executed notebooks as outputs

    nb: output/raw.ipynb
    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  160. pipeline.yaml
    tasks:
    # source is the code you want to execute

    - source: raw.ipynb
    # products are task's outputs

    product:
    # tasks generate executed notebooks as outputs

    nb: output/raw.ipynb
    # you can define as many outputs as you want

    data: output/raw_data.csv

    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  161. pipeline.yaml
    tasks:
    # source is the code you want to execute

    - source: raw.ipynb
    # products are task's outputs

    product:
    # tasks generate executed notebooks as outputs

    nb: output/raw.ipynb
    # you can define as many outputs as you want

    data: output/raw_data.csv

    - source: clean.ipynb
    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  162. pipeline.yaml
    tasks:
    # source is the code you want to execute

    - source: raw.ipynb
    # products are task's outputs

    product:
    # tasks generate executed notebooks as outputs

    nb: output/raw.ipynb
    # you can define as many outputs as you want

    data: output/raw_data.csv

    - source: clean.ipynb
    product:
    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  163. pipeline.yaml
    tasks:
    # source is the code you want to execute

    - source: raw.ipynb
    # products are task's outputs

    product:
    # tasks generate executed notebooks as outputs

    nb: output/raw.ipynb
    # you can define as many outputs as you want

    data: output/raw_data.csv

    - source: clean.ipynb
    product:
    nb: output/clean.ipynb
    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  164. pipeline.yaml
    tasks:
    # source is the code you want to execute

    - source: raw.ipynb
    # products are task's outputs

    product:
    # tasks generate executed notebooks as outputs

    nb: output/raw.ipynb
    # you can define as many outputs as you want

    data: output/raw_data.csv

    - source: clean.ipynb
    product:
    nb: output/clean.ipynb
    data: output/clean_data.parquet

    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  165. pipeline.yaml
    tasks:
    # source is the code you want to execute

    - source: raw.ipynb
    # products are task's outputs

    product:
    # tasks generate executed notebooks as outputs

    nb: output/raw.ipynb
    # you can define as many outputs as you want

    data: output/raw_data.csv

    - source: clean.ipynb
    product:
    nb: output/clean.ipynb
    data: output/clean_data.parquet

    - source: plot.ipynb
    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  166. pipeline.yaml
    tasks:
    # source is the code you want to execute

    - source: raw.ipynb
    # products are task's outputs

    product:
    # tasks generate executed notebooks as outputs

    nb: output/raw.ipynb
    # you can define as many outputs as you want

    data: output/raw_data.csv

    - source: clean.ipynb
    product:
    nb: output/clean.ipynb
    data: output/clean_data.parquet

    - source: plot.ipynb
    product: output/plot.ipynb
    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  167. Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html

    View full-size slide

  168. Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html

    View full-size slide

  169. Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html

    View full-size slide

  170. $ ploomber build
    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  171. $ ploomber build
    Building task ‘raw': 0%| | 0/5 [00:00, ?it/s]

    Executing: 0%| | 0/6 [00:00, ?cell/s]

    Executing: 17%|█▋ | 1/6 [00:04<00:21, 4.25s/cell]

    Executing: 33%|███▎ | 2/6 [00:04<00:07, 1.82s/cell]

    Executing: 100%|██████████| 6/6 [00:05<00:00, 1.11cell/s]
    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  172. $ ploomber build
    Building task ‘raw': 0%| | 0/5 [00:00, ?it/s]

    Executing: 0%| | 0/6 [00:00, ?cell/s]

    Executing: 17%|█▋ | 1/6 [00:04<00:21, 4.25s/cell]

    Executing: 33%|███▎ | 2/6 [00:04<00:07, 1.82s/cell]

    Executing: 100%|██████████| 6/6 [00:05<00:00, 1.11cell/s]
    Building task 'clean': 20%|██ | 1/5 [00:05<00:21, 5.47s/it]

    Executing: 0%| | 0/7 [00:00, ?cell/s]

    Executing: 14%|█▍ | 1/7 [00:01<00:10, 1.76s/cell]

    Executing: 43%|████▎ | 3/7 [00:23<00:34, 8.63s/cell]

    Executing: 71%|███████▏ | 5/7 [00:25<00:09, 4.69s/cell]

    Executing: 86%|████████▌ | 6/7 [00:28<00:04, 4.14s/cell]

    Executing: 100%|██████████| 7/7 [00:29<00:00, 4.24s/cell]
    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  173. $ ploomber build
    Building task ‘raw': 0%| | 0/5 [00:00, ?it/s]

    Executing: 0%| | 0/6 [00:00, ?cell/s]

    Executing: 17%|█▋ | 1/6 [00:04<00:21, 4.25s/cell]

    Executing: 33%|███▎ | 2/6 [00:04<00:07, 1.82s/cell]

    Executing: 100%|██████████| 6/6 [00:05<00:00, 1.11cell/s]
    Building task 'clean': 20%|██ | 1/5 [00:05<00:21, 5.47s/it]

    Executing: 0%| | 0/7 [00:00, ?cell/s]

    Executing: 14%|█▍ | 1/7 [00:01<00:10, 1.76s/cell]

    Executing: 43%|████▎ | 3/7 [00:23<00:34, 8.63s/cell]

    Executing: 71%|███████▏ | 5/7 [00:25<00:09, 4.69s/cell]

    Executing: 86%|████████▌ | 6/7 [00:28<00:04, 4.14s/cell]

    Executing: 100%|██████████| 7/7 [00:29<00:00, 4.24s/cell]
    Building task ‘plot': 40%|████ | 2/5 [00:35<00:59, 19.75s/it]

    Executing: 0%| | 0/9 [00:00, ?cell/s]

    Executing: 11%|█ | 1/9 [00:02<00:22, 2.80s/cell]

    Executing: 33%|███▎ | 3/9 [00:02<00:04, 1.28cell/s]

    Executing: 56%|█████▌ | 5/9 [00:03<00:01, 2.42cell/s]

    Executing: 100%|██████████| 9/9 [00:03<00:00, 2.26cell/s]
    ploomber
    Source: https://docs.ploomber.io/en/latest/get-started/
    fi
    rst-pipeline.html

    View full-size slide

  174. What's in it for me?
    ploomber

    View full-size slide

  175. A human-friendly computational narrative
    of every pipeline execution
    ploomber

    View full-size slide

  176. “[W]e’ve gained a key improvement over a non-notebook
    execution pattern: our input and outputs are complete documents,
    wholly executable and shareable in the same interface.”
    Source: https://net
    fl
    ixtechblog.com/scheduling-notebooks-348e6c14cfd6
    — Net
    fl
    ix (2018)

    View full-size slide

  177. Interactive pipeline inspection and
    debugging in Jupyter Notebooks
    ploomber

    View full-size slide

  178. “Say something went wrong… How might we debug and
    fi
    x the
    issue? The
    fi
    rst place we’d want to look is the notebook output. It
    will have a stack trace, and ultimately any output information
    related to an error…

    Source: https://net
    fl
    ixtechblog.com/scheduling-notebooks-348e6c14cfd6
    — Net
    fl
    ix (2018)

    View full-size slide

  179. “Say something went wrong… How might we debug and
    fi
    x the
    issue? The
    fi
    rst place we’d want to look is the notebook output. It
    will have a stack trace, and ultimately any output information
    related to an error…

    [W]e simply take the output notebook with our exact failed runtime
    parameterizations and load it into a notebook server… With a few
    iterations… we can quickly
    fi
    nd a
    fi
    x for the failure.
    Source: https://net
    fl
    ixtechblog.com/scheduling-notebooks-348e6c14cfd6
    — Net
    fl
    ix (2018)

    View full-size slide

  180. Incremental builds
    ploomber

    View full-size slide

  181. Source: https://docs.ploomber.io/en/latest/use-cases/ml.html

    View full-size slide

  182. Source: https://docs.ploomber.io/en/latest/use-cases/ml.html

    View full-size slide

  183. Test each stage of your data pipeline
    ploomber

    View full-size slide

  184. Modular pipelines →
    collaborative development
    ploomber

    View full-size slide

  185. Source: https://docs.ploomber.io/en/latest/use-cases/ml.html
    👩💻

    View full-size slide

  186. Source: https://docs.ploomber.io/en/latest/use-cases/ml.html
    👨💻
    👩💻

    View full-size slide

  187. Source: https://docs.ploomber.io/en/latest/use-cases/ml.html
    👩💻
    👨💻
    🧑💻

    View full-size slide

  188. Automated deployment

    to Air
    fl
    ow, AWS Batch, or Kubernetes
    ploomber

    View full-size slide

  189. Store Jupyter Notebooks as plain text

    for easier version control

    View full-size slide

  190. jupytext
    Store Jupyter Notebooks as plain text

    for easier version control

    View full-size slide

  191. Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html

    View full-size slide

  192. Creating interactive reports and dashboards

    View full-size slide

  193. voilà
    •Initial Release: 2018


    •GitHub Stars: 4.1k 🌟


    •GitHub: https://github.com/voila-dashboards/voila

    View full-size slide

  194. What is it?
    voilà

    View full-size slide

  195. A tool for serving Jupyter Notebooks
    as clean, stand-alone web applications
    voilà

    View full-size slide

  196. What do I have to do to use it?
    voilà

    View full-size slide

  197. Not much!
    voilà

    View full-size slide

  198. Setup
    • pip install voila or conda install voila -c conda-forge
    voilà

    View full-size slide

  199. Setup
    • pip install voila or conda install voila -c conda-forge
    • To serve a single notebook: voila my_notbook.ipynb
    voilà

    View full-size slide

  200. Setup
    • pip install voila or conda install voila -c conda-forge
    • To serve a single notebook: voila my_notbook.ipynb
    • To serve a whole directory of notebooks: voila
    voilà

    View full-size slide

  201. Setup
    • pip install voila or conda install voila -c conda-forge
    • To serve a single notebook: voila my_notbook.ipynb
    • To serve a whole directory of notebooks: voila
    • Optionally specify a custom template:
    voilà

    View full-size slide

  202. Setup
    • pip install voila or conda install voila -c conda-forge
    • To serve a single notebook: voila my_notbook.ipynb
    • To serve a whole directory of notebooks: voila
    • Optionally specify a custom template:
    • voila my_notebook.ipynb --template=gridstack
    voilà

    View full-size slide

  203. What's in it for me?
    voilà

    View full-size slide

  204. Execute and serve Jupyter Notebooks
    for end users
    voilà

    View full-size slide

  205. Source: https://github.com/sysuin/covid-19-world-dashboard

    View full-size slide

  206. Source: https://github.com/sysuin/covid-19-world-dashboard

    View full-size slide

  207. Source: https://github.com/sysuin/covid-19-world-dashboard

    View full-size slide

  208. Interactive plots and widgets still work
    voilà

    View full-size slide

  209. Source: https://github.com/dhaitz/machine-learning-interactive-visualization

    View full-size slide

  210. Source: https://github.com/dhaitz/machine-learning-interactive-visualization

    View full-size slide

  211. Customize the look and feel of your
    dashboard with templates
    voilà

    View full-size slide

  212. voilà
    Source: https://github.com/voila-dashboards/voila-vuetify

    View full-size slide

  213. voilà
    Source: https://github.com/voila-dashboards/voila-vuetify

    View full-size slide

  214. Long-running notebooks
    voilà

    View full-size slide

  215. So, where does this leave us?

    View full-size slide

  216. A smoother path to production for
    work that starts in Jupyter Notebooks

    View full-size slide

  217. • Software Libraries → nbdev projects


    • Data Transformation Work
    fl
    ows → ploomber pipelines


    • Reports and Dashboards → voilà dashboards

    View full-size slide

  218. Data science teams can own a project end-to-end in a
    tool and environment they're already comfortable with

    View full-size slide

  219. Jupyter Notebooks become production artifacts

    View full-size slide

  220. We can retain the interactivity and computational narrative
    strengths of Jupyter Notebooks, even in production settings

    View full-size slide

  221. Where to go from here?

    View full-size slide

  222. Jupyter in Production
    Data Theoretic

    View full-size slide