$30 off During Our Annual Pro Sale. View Details »

Jupyter in Production - Rev 3

Jupyter in Production - Rev 3

Presented at the Rev 3 MLOps Conference in New York City on May 5, 2022.

Patrick Harrison

May 05, 2022
Tweet

More Decks by Patrick Harrison

Other Decks in Programming

Transcript

  1. Jupyter in Production

  2. whois

  3. whois Patrick Harrison

  4. whois Patrick Harrison Data Theoretic

  5. whois Patrick Harrison Data Theoretic Previously: Led AI Engineering at

    a major fi nancial data company
  6. Source: https://ipython.org/ipython-doc/rel-0.12/whatsnew/version0.12.html

  7. Jupyter Notebooks just turned ten years old Source: https://ipython.org/ipython-doc/rel-0.12/whatsnew/version0.12.html

  8. Jupyter Notebooks just turned ten years old The original IPython

    Notebook was fi rst released on December 19, 2011 Source: https://ipython.org/ipython-doc/rel-0.12/whatsnew/version0.12.html
  9. None
  10. Source: https://github.com/parente/nbestimate/blob/master/estimate.ipynb Public Jupyter Notebooks on GitHub

  11. Source: https://github.com/parente/nbestimate/blob/master/estimate.ipynb Public Jupyter Notebooks on GitHub ≈0

  12. Source: https://github.com/parente/nbestimate/blob/master/estimate.ipynb Public Jupyter Notebooks on GitHub ≈0 ≈10,000,000

  13. 8,000+ new public Jupyter Notebooks posted on GitHub every day

    in 2022, on average Source: https://github.com/parente/nbestimate/blob/master/ipynb_counts.csv
  14. Jupyter Notebooks have been used to do some amazing things

  15. Source: https://blog.jupyter.org/congratulations-to-the-ligo-and-virgo-collaborations-from-project-jupyter-5923247be019 On behalf of the entire Project Jupyter team,

    we’d like to say congratulations to Rainer Weiss, Barry C. Barish, Kip S. Thorne and the rest of the LIGO and VIRGO teams for the Nobel Prize in Physics 2017. Since 2015, the LIGO and VIRGO Collaborations have observed multiple instances of gravitational waves due to colliding black holes (and more recently neutron stars). These observations represent decades of work and confirm what Einstein had theorized a hundred years ago. ... To communicate to the broader community, the LIGO/VIRGO Collaboration has created tutorials with Jupyter Notebooks that describe how to use LIGO/ VIRGO data and reproduce analyses related to their academic publications.
  16. Source: https://blog.jupyter.org/jupyter-receives-the-acm-software-system-award-d433b0dfe3a2 It is our pleasure to announce that Project

    Jupyter has been awarded the 2017 ACM Software System Award, a significant honor for the project. We are humbled to join an illustrious list of projects that contains major highlights of computing history, including Unix, TeX, S (R’s predecessor), the Web, Mosaic, Java, INGRES (modern databases) and more.
  17. Jupyter Notebooks have some compelling strengths

  18. #1

  19. Interactive, exploratory programming with immediate feedback #1

  20. #2

  21. Build a computational narrative bringing together code, results, explanatory prose,

    plots, images, widgets, and more in a single, human-friendly document #2
  22. Lower barriers to entry

  23. ...many more people and roles can access, use, and collaborate

    on programming and data analysis in their work Lower barriers to entry
  24. Increased productivity

  25. Increased productivity ...for programmers of all skill levels

  26. "We’ve found that we’re 2x-3x more productive using [notebook-based development]

    than using traditional programming tools... Source: https://www.fast.ai/2019/12/02/nbdev/
  27. "We’ve found that we’re 2x-3x more productive using [notebook-based development]

    than using traditional programming tools... ...this is a big surprise, since I have coded nearly every day for over 30 years, and in that time have tried dozens of tools, libraries, and systems for building programs." Source: https://www.fast.ai/2019/12/02/nbdev/
  28. "We’ve found that we’re 2x-3x more productive using [notebook-based development]

    than using traditional programming tools... ...this is a big surprise, since I have coded nearly every day for over 30 years, and in that time have tried dozens of tools, libraries, and systems for building programs." Source: https://www.fast.ai/2019/12/02/nbdev/ — Jeremy Howard, fast.ai
  29. Jupyter Notebooks have become an essential part of the data

    scientist's toolkit
  30. But, a story you've probably heard before...

  31. The magic words...

  32. "Let's put this in production" The magic words...

  33. "You can't use Jupyter Notebooks in production"

  34. Why not?

  35. "It's not supported."

  36. This is a pain to version control.

  37. This is a pain to version control. This is monolithic.

    How will we collaborate effectively?
  38. This is a pain to version control. This is monolithic.

    How will we collaborate effectively? How can we share and reuse this code?
  39. This is a pain to version control. This is monolithic.

    How will we collaborate effectively? How can we share and reuse this code? How do we apply our code quality standards?
  40. This is a pain to version control. This is monolithic.

    How will we collaborate effectively? How can we share and reuse this code? How do we apply our code quality standards? How do we test this code?
  41. This is a pain to version control. This is monolithic.

    How will we collaborate effectively? How can we share and reuse this code? How do we apply our code quality standards? How do we test this code? Will this work with our continuous integration system?
  42. This is a pain to version control. This is monolithic.

    How will we collaborate effectively? How can we share and reuse this code? How do we apply our code quality standards? How do we test this code? Will this work with our continuous integration system? How do we schedule and trigger automatic execution?
  43. This is a pain to version control. This is monolithic.

    How will we collaborate effectively? How can we share and reuse this code? How do we apply our code quality standards? How do we test this code? Will this work with our continuous integration system? How do we schedule and trigger automatic execution? Out-of-order cell execution!
  44. This is a pain to version control. This is monolithic.

    How will we collaborate effectively? How can we share and reuse this code? How do we apply our code quality standards? How do we test this code? Will this work with our continuous integration system? How do we schedule and trigger automatic execution? Out-of-order cell execution! ...
  45. OK, how should we get this work into production?

  46. OK, how should we get this work into production? “It

    looks like there's a lot going on in your notebook…"
  47. Your notebook has reusable code... How should we get this

    work into production?
  48. Your notebook has reusable code... ... you're going to need

    to reimplement this code as proper software libraries, How should we get this work into production?
  49. Your notebook has reusable code... ... you're going to need

    to reimplement this code as proper software libraries, ... subject to our company-wide software engineering standards, How should we get this work into production?
  50. Your notebook has reusable code... ... you're going to need

    to reimplement this code as proper software libraries, ... subject to our company-wide software engineering standards, ... with reimplemented tests using our company's preferred testing framework, How should we get this work into production?
  51. Your notebook has reusable code... ... you're going to need

    to reimplement this code as proper software libraries, ... subject to our company-wide software engineering standards, ... with reimplemented tests using our company's preferred testing framework, ... using our preferred enterprise continuous integration system, How should we get this work into production?
  52. Your notebook has reusable code... ... you're going to need

    to reimplement this code as proper software libraries, ... subject to our company-wide software engineering standards, ... with reimplemented tests using our company's preferred testing framework, ... using our preferred enterprise continuous integration system, ... and deploy to our preferred enterprise artifact repository. How should we get this work into production?
  53. Your notebook is accessing and transforming data... How should we

    get this work into production?
  54. Your notebook is accessing and transforming data... ... you're going

    to need to reimplement this logic as data pipelines in our preferred enterprise data pipeline framework, How should we get this work into production?
  55. Your notebook is accessing and transforming data... ... you're going

    to need to reimplement this logic as data pipelines in our preferred enterprise data pipeline framework, ... which has its own engineering practices and conventions, How should we get this work into production?
  56. Your notebook is accessing and transforming data... ... you're going

    to need to reimplement this logic as data pipelines in our preferred enterprise data pipeline framework, ... which has its own engineering practices and conventions, ... and may not even use the same programming language. How should we get this work into production?
  57. Your notebook generates predictions... How should we get this work

    into production?
  58. Your notebook generates predictions... ... you're going to need to

    reimplement the model as a web service, How should we get this work into production?
  59. Your notebook generates predictions... ... you're going to need to

    reimplement the model as a web service, ... wrap it in a Docker container, How should we get this work into production?
  60. Your notebook generates predictions... ... you're going to need to

    reimplement the model as a web service, ... wrap it in a Docker container, ... store it in our preferred enterprise container registry, How should we get this work into production?
  61. Your notebook generates predictions... ... you're going to need to

    reimplement the model as a web service, ... wrap it in a Docker container, ... store it in our preferred enterprise container registry, ... and deploy it to our preferred enterprise container orchestration platform. How should we get this work into production?
  62. Your notebook presents results to end users... How should we

    get this work into production?
  63. Your notebook presents results to end users... ... you're going

    to need to reimplement these reports in our preferred enterprise business intelligence platform, How should we get this work into production?
  64. Your notebook presents results to end users... ... you're going

    to need to reimplement these reports in our preferred enterprise business intelligence platform, ... which has its own engineering practices and conventions, How should we get this work into production?
  65. Your notebook presents results to end users... ... you're going

    to need to reimplement these reports in our preferred enterprise business intelligence platform, ... which has its own engineering practices and conventions, ... and may not even use the same programming language. How should we get this work into production?
  66. So you're telling me that if we're going to get

    our work in production, either:
  67. So you're telling me that if we're going to get

    our work in production, either: 1. Our data science teams have to be stacked with unicorns,
  68. So you're telling me that if we're going to get

    our work in production, either: 1. Our data science teams have to be stacked with unicorns, or
  69. So you're telling me that if we're going to get

    our work in production, either: 1. Our data science teams have to be stacked with unicorns, or 2. We have to loop in a bunch of other teams and create dependencies between them
  70. My teams went through this process so many times we

    had a name for it
  71. de • notebook • i fi cation

  72. de • notebook • i fi cation The long, painful

    process of exploding a Jupyter Notebook that de fi nitely works into a constellation of disparate production artifacts that maybe don't
  73. ⚠ WARNING: De-notebook-i fi cation has been shown to have

    side effects including increased complexity, elongated timelines, unhappy stakeholders, frustrated data scientists, increased risk of project cancelation, and loss of data science team credibility.
  74. Additional problem:

  75. Additional problem: If Jupyter is only for demos and prototypes...

  76. Additional problem: If Jupyter is only for demos and prototypes...

    Why bother writing good code in notebooks?
  77. "Maybe you shouldn't use Jupyter in the fi rst place"

  78. "Maybe you shouldn't use Jupyter in the fi rst place"

    There has to be a better answer
  79. enter the Jupyter in Production ecosystem

  80. But fi rst... what does in production mean, anyway?

  81. For this talk, we'll focus on: What does in production

    mean, anyway?
  82. For this talk, we'll focus on: •Developing and distributing software

    libraries What does in production mean, anyway?
  83. For this talk, we'll focus on: •Developing and distributing software

    libraries •Building and running data pipelines What does in production mean, anyway?
  84. For this talk, we'll focus on: •Developing and distributing software

    libraries •Building and running data pipelines •Creating interactive reports and dashboards What does in production mean, anyway?
  85. For each of these tools, I'll try to answer...

  86. ... what is it? For each of these tools, I'll

    try to answer...
  87. ... what is it? ... what do I have to

    do to use it? For each of these tools, I'll try to answer...
  88. ... what is it? ... what do I have to

    do to use it? ... what's in it for me? For each of these tools, I'll try to answer...
  89. Developing and distributing software libraries

  90. nbdev •Initial Release: 2019 •GitHub Stars: 3.2k 🌟 •GitHub: https://github.com/fastai/nbdev/

  91. What is it? nbdev

  92. A collection of tools that let you use Jupyter Notebooks

    as the source code for Python software libraries nbdev
  93. What do I have to do to use it? nbdev

  94. Setup • pip install nbdev or conda install nbdev -c

    fastai nbdev
  95. Setup • pip install nbdev or conda install nbdev -c

    fastai • Initialize your git repository as an nbdev project: nbdev_new 
 (Or, copy the of fi cial nbdev template repo on GitHub) nbdev
  96. Setup • pip install nbdev or conda install nbdev -c

    fastai • Initialize your git repository as an nbdev project: nbdev_new 
 (Or, copy the of fi cial nbdev template repo on GitHub) • Install the nbdev git hooks: nbdev_install_git_hooks nbdev
  97. Setup • pip install nbdev or conda install nbdev -c

    fastai • Initialize your git repository as an nbdev project: nbdev_new 
 (Or, copy the of fi cial nbdev template repo on GitHub) • Install the nbdev git hooks: nbdev_install_git_hooks • Enter some basic project information in settings.ini nbdev
  98. Basic Usage • Start with exploratory programming in Jupyter Notebooks,

    as usual nbdev
  99. Basic Usage • Start with exploratory programming in Jupyter Notebooks,

    as usual • As you go, notice when it would make sense to reuse or share bits of the code you write nbdev
  100. Basic Usage • Start with exploratory programming in Jupyter Notebooks,

    as usual • As you go, notice when it would make sense to reuse or share bits of the code you write • Reshape this code into functions and classes in a notebook nbdev
  101. Basic Usage • Start with exploratory programming in Jupyter Notebooks,

    as usual • As you go, notice when it would make sense to reuse or share bits of the code you write • Reshape this code into functions and classes in a notebook • Add the #export fl ag (code comment) at the start of your main code cells nbdev
  102. Basic Usage • Start with exploratory programming in Jupyter Notebooks,

    as usual • As you go, notice when it would make sense to reuse or share bits of the code you write • Reshape this code into functions and classes in a notebook • Add the #export fl ag (code comment) at the start of your main code cells • Next to your main code cells, add rich explanatory text, images, code usage examples, sample output, and assert statements nbdev
  103. Source: https://nbdev.fast.ai/example.html

  104. Source: https://nbdev.fast.ai/example.html

  105. Source: https://nbdev.fast.ai/example.html

  106. Source: https://nbdev.fast.ai/example.html

  107. Source: https://nbdev.fast.ai/example.html

  108. What's in it for me? nbdev

  109. Quite a bit, actually. nbdev

  110. Automatically export the code from your Jupyter Notebooks into a

    fully-functional Python package: nbdev nbdev_build_lib
  111. Source: https://nbdev.fast.ai/example.html

  112. Automatically publish new releases of your package to PyPI and

    conda: nbdev make release
  113. Automatically generate a rich documentation website for your package from

    your Jupyter Notebooks: nbdev nbdev_build_docs
  114. Source: https://nbdev.fast.ai/example.html

  115. Avoid common version control con fl icts and resolving them

    when they occur: nbdev nbdev_clean_nbs & nbdev_fix_merge
  116. Source: https://nbdev.fast.ai/merge.html

  117. Automatically run tests on your notebooks: nbdev nbdev_test_nbs

  118. Source: https://nbdev.fast.ai/example.html

  119. nbdev $ nbdev_test_nbs testing: card.ipynb testing: deck.ipynb All tests are

    passing! Source: https://nbdev.fast.ai/tutorial.html
  120. Continuous integration out-of-the-box with git hooks and GitHub Actions nbdev

  121. Conceptual shift nbdev ⚠

  122. None
  123. With nbdev, your source code, tests, and documentation all live

    together in one place nbdev
  124. Source: https://nbdev.fast.ai/example.html Code

  125. Source: https://nbdev.fast.ai/example.html Code Docs Docs

  126. Source: https://nbdev.fast.ai/example.html Code Tests Docs Docs

  127. "The magic of nbdev is that it doesn’t actually change

    programming that much; you add a #export or #hide tag to your notebook cells once in a while, and you run nbdev_build_lib and nbdev_build_docs when you fi nish up your code. 
 Source: https://www.overstory.com/blog/how-nbdev-helps-us-structure-our-data-science-work fl ow-in-jupyter-notebooks nbdev
  128. "The magic of nbdev is that it doesn’t actually change

    programming that much; you add a #export or #hide tag to your notebook cells once in a while, and you run nbdev_build_lib and nbdev_build_docs when you fi nish up your code. 
 That’s it! There’s nothing new to learn, nothing to unlearn. It’s just notebooks." Source: https://www.overstory.com/blog/how-nbdev-helps-us-structure-our-data-science-work fl ow-in-jupyter-notebooks nbdev
  129. “[nbdev] incentives us to write clear code, use proper Git

    version control and document and test our codebase continuously... [while] preserving the bene fi ts of having interactive Jupyter notebooks in which it is easy to experiment." Source: https://www.overstory.com/blog/how-nbdev-helps-us-structure-our-data-science-work fl ow-in-jupyter-notebooks nbdev
  130. “[nbdev] incentives us to write clear code, use proper Git

    version control and document and test our codebase continuously... [while] preserving the bene fi ts of having interactive Jupyter notebooks in which it is easy to experiment." Source: https://www.overstory.com/blog/how-nbdev-helps-us-structure-our-data-science-work fl ow-in-jupyter-notebooks nbdev — Overstory
  131. Bonus Picks

  132. Visually compare notebook versions

  133. nbdime and ReviewNB Visually compare notebook versions

  134. Source: https://nbdime.readthedocs.io

  135. Source: https://www.reviewnb.com/

  136. Run your favorite code quality tools on notebooks

  137. nbQA Run your favorite code quality tools on notebooks

  138. $ nbqa black my_notebook.ipynb reformatted my_notebook.ipynb All done! ✨ 🍰

    ✨ 1 files reformatted. Source: https://nbqa.readthedocs.io/en/latest/examples.html nbQA
  139. Building and running data pipelines

  140. Source: https://docs.ploomber.io/en/latest/use-cases/ml.html

  141. “We’re currently in the process of migrating all 10,000 of

    the scheduled jobs running on the Net fl ix Data Platform to use notebook-based execution… 
 Source: https://net fl ixtechblog.com/scheduling-notebooks-348e6c14cfd6
  142. “We’re currently in the process of migrating all 10,000 of

    the scheduled jobs running on the Net fl ix Data Platform to use notebook-based execution… 
 When we’re done, more than 150,000 [pipeline executions] will be running through notebooks on our platform every single day.” Source: https://net fl ixtechblog.com/scheduling-notebooks-348e6c14cfd6
  143. “We’re currently in the process of migrating all 10,000 of

    the scheduled jobs running on the Net fl ix Data Platform to use notebook-based execution… 
 When we’re done, more than 150,000 [pipeline executions] will be running through notebooks on our platform every single day.” Source: https://net fl ixtechblog.com/scheduling-notebooks-348e6c14cfd6 — Net fl ix (2018)
  144. ploomber •Initial Release: 2020 •GitHub Stars: 2.3k 🌟 •GitHub: https://github.com/ploomber/ploomber

  145. What is it? ploomber

  146. A framework to build and execute data pipelines made out

    of Jupyter Notebooks ploomber
  147. What do I have to do to use it? ploomber

  148. Setup • pip install ploomber or 
 conda install ploomber

    -c conda-forge ploomber
  149. Setup • pip install ploomber or 
 conda install ploomber

    -c conda-forge • Initialize your git repository as a ploomber project: ploomber
  150. Setup • pip install ploomber or 
 conda install ploomber

    -c conda-forge • Initialize your git repository as a ploomber project: • ploomber scaffold --empty ploomber
  151. Setup • pip install ploomber or 
 conda install ploomber

    -c conda-forge • Initialize your git repository as a ploomber project: • ploomber scaffold --empty • Add information about your pipeline to pipeline.yaml as you go ploomber
  152. Basic Usage • Start with exploratory programming in Jupyter Notebooks,

    as usual ploomber
  153. Basic Usage • Start with exploratory programming in Jupyter Notebooks,

    as usual • As you go, notice when chunks of your code would make sense as modular "tasks" in a data transformation work fl ow ploomber
  154. Basic Usage • Start with exploratory programming in Jupyter Notebooks,

    as usual • As you go, notice when chunks of your code would make sense as modular "tasks" in a data transformation work fl ow • Move the code for each task into its own dedicated notebook ploomber
  155. Basic Usage • Start with exploratory programming in Jupyter Notebooks,

    as usual • As you go, notice when chunks of your code would make sense as modular "tasks" in a data transformation work fl ow • Move the code for each task into its own dedicated notebook • Next to your code cells, add rich explanatory text, images, example expected output, and data quality checks ploomber
  156. Basic Usage • Record information about your task notebooks in

    pipeline.yaml ploomber
  157. Basic Usage • Record information about your task notebooks in

    pipeline.yaml • Add a few variables to your task notebooks to de fi ne upstream dependencies ploomber
  158. Basic Usage • Record information about your task notebooks in

    pipeline.yaml • Add a few variables to your task notebooks to de fi ne upstream dependencies • Run your pipeline with ploomber build ploomber
  159. Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html

  160. Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html .ipynb

  161. Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html .ipynb .ipynb .ipynb

  162. pipeline.yaml ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

  163. pipeline.yaml tasks: ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

  164. pipeline.yaml tasks: # source is the code you want to

    execute 
 - source: raw.ipynb ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html
  165. pipeline.yaml tasks: # source is the code you want to

    execute 
 - source: raw.ipynb # products are task's outputs 
 product: ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html
  166. pipeline.yaml tasks: # source is the code you want to

    execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html
  167. pipeline.yaml tasks: # source is the code you want to

    execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb # you can define as many outputs as you want 
 data: output/raw_data.csv 
 ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html
  168. pipeline.yaml tasks: # source is the code you want to

    execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb # you can define as many outputs as you want 
 data: output/raw_data.csv 
 - source: clean.ipynb ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html
  169. pipeline.yaml tasks: # source is the code you want to

    execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb # you can define as many outputs as you want 
 data: output/raw_data.csv 
 - source: clean.ipynb product: ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html
  170. pipeline.yaml tasks: # source is the code you want to

    execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb # you can define as many outputs as you want 
 data: output/raw_data.csv 
 - source: clean.ipynb product: nb: output/clean.ipynb ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html
  171. pipeline.yaml tasks: # source is the code you want to

    execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb # you can define as many outputs as you want 
 data: output/raw_data.csv 
 - source: clean.ipynb product: nb: output/clean.ipynb data: output/clean_data.parquet 
 ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html
  172. pipeline.yaml tasks: # source is the code you want to

    execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb # you can define as many outputs as you want 
 data: output/raw_data.csv 
 - source: clean.ipynb product: nb: output/clean.ipynb data: output/clean_data.parquet 
 - source: plot.ipynb ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html
  173. pipeline.yaml tasks: # source is the code you want to

    execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb # you can define as many outputs as you want 
 data: output/raw_data.csv 
 - source: clean.ipynb product: nb: output/clean.ipynb data: output/clean_data.parquet 
 - source: plot.ipynb product: output/plot.ipynb ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html
  174. Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html

  175. Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html

  176. Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html

  177. $ ploomber build ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

  178. $ ploomber build Building task ‘raw': 0%| | 0/5 [00:00<?,

    ?it/s] 
 Executing: 0%| | 0/6 [00:00<?, ?cell/s] 
 Executing: 17%|█▋ | 1/6 [00:04<00:21, 4.25s/cell] 
 Executing: 33%|███▎ | 2/6 [00:04<00:07, 1.82s/cell] 
 Executing: 100%|██████████| 6/6 [00:05<00:00, 1.11cell/s] ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html
  179. $ ploomber build Building task ‘raw': 0%| | 0/5 [00:00<?,

    ?it/s] 
 Executing: 0%| | 0/6 [00:00<?, ?cell/s] 
 Executing: 17%|█▋ | 1/6 [00:04<00:21, 4.25s/cell] 
 Executing: 33%|███▎ | 2/6 [00:04<00:07, 1.82s/cell] 
 Executing: 100%|██████████| 6/6 [00:05<00:00, 1.11cell/s] Building task 'clean': 20%|██ | 1/5 [00:05<00:21, 5.47s/it] 
 Executing: 0%| | 0/7 [00:00<?, ?cell/s] 
 Executing: 14%|█▍ | 1/7 [00:01<00:10, 1.76s/cell] 
 Executing: 43%|████▎ | 3/7 [00:23<00:34, 8.63s/cell] 
 Executing: 71%|███████▏ | 5/7 [00:25<00:09, 4.69s/cell] 
 Executing: 86%|████████▌ | 6/7 [00:28<00:04, 4.14s/cell] 
 Executing: 100%|██████████| 7/7 [00:29<00:00, 4.24s/cell] ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html
  180. $ ploomber build Building task ‘raw': 0%| | 0/5 [00:00<?,

    ?it/s] 
 Executing: 0%| | 0/6 [00:00<?, ?cell/s] 
 Executing: 17%|█▋ | 1/6 [00:04<00:21, 4.25s/cell] 
 Executing: 33%|███▎ | 2/6 [00:04<00:07, 1.82s/cell] 
 Executing: 100%|██████████| 6/6 [00:05<00:00, 1.11cell/s] Building task 'clean': 20%|██ | 1/5 [00:05<00:21, 5.47s/it] 
 Executing: 0%| | 0/7 [00:00<?, ?cell/s] 
 Executing: 14%|█▍ | 1/7 [00:01<00:10, 1.76s/cell] 
 Executing: 43%|████▎ | 3/7 [00:23<00:34, 8.63s/cell] 
 Executing: 71%|███████▏ | 5/7 [00:25<00:09, 4.69s/cell] 
 Executing: 86%|████████▌ | 6/7 [00:28<00:04, 4.14s/cell] 
 Executing: 100%|██████████| 7/7 [00:29<00:00, 4.24s/cell] Building task ‘plot': 40%|████ | 2/5 [00:35<00:59, 19.75s/it] 
 Executing: 0%| | 0/9 [00:00<?, ?cell/s] 
 Executing: 11%|█ | 1/9 [00:02<00:22, 2.80s/cell] 
 Executing: 33%|███▎ | 3/9 [00:02<00:04, 1.28cell/s] 
 Executing: 56%|█████▌ | 5/9 [00:03<00:01, 2.42cell/s] 
 Executing: 100%|██████████| 9/9 [00:03<00:00, 2.26cell/s] ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html
  181. What's in it for me? ploomber

  182. A human-friendly computational narrative of every pipeline execution ploomber

  183. “[W]e’ve gained a key improvement over a non-notebook execution pattern:

    our input and outputs are complete documents, wholly executable and shareable in the same interface.” Source: https://net fl ixtechblog.com/scheduling-notebooks-348e6c14cfd6 — Net fl ix (2018)
  184. Interactive pipeline inspection and debugging in Jupyter Notebooks ploomber

  185. “Say something went wrong… How might we debug and fi

    x the issue? The fi rst place we’d want to look is the notebook output. It will have a stack trace, and ultimately any output information related to an error… 
 Source: https://net fl ixtechblog.com/scheduling-notebooks-348e6c14cfd6 — Net fl ix (2018)
  186. “Say something went wrong… How might we debug and fi

    x the issue? The fi rst place we’d want to look is the notebook output. It will have a stack trace, and ultimately any output information related to an error… 
 [W]e simply take the output notebook with our exact failed runtime parameterizations and load it into a notebook server… With a few iterations… we can quickly fi nd a fi x for the failure. Source: https://net fl ixtechblog.com/scheduling-notebooks-348e6c14cfd6 — Net fl ix (2018)
  187. Incremental builds ploomber

  188. Source: https://docs.ploomber.io/en/latest/use-cases/ml.html

  189. Source: https://docs.ploomber.io/en/latest/use-cases/ml.html

  190. Test each stage of your data pipeline ploomber

  191. Modular pipelines → collaborative development ploomber

  192. Source: https://docs.ploomber.io/en/latest/use-cases/ml.html 👩💻

  193. Source: https://docs.ploomber.io/en/latest/use-cases/ml.html 👨💻 👩💻

  194. Source: https://docs.ploomber.io/en/latest/use-cases/ml.html 👩💻 👨💻 🧑💻

  195. Automated deployment 
 to Air fl ow, AWS Batch, or

    Kubernetes ploomber
  196. Bonus Pick

  197. Store Jupyter Notebooks as plain text 
 for easier version

    control
  198. jupytext Store Jupyter Notebooks as plain text 
 for easier

    version control
  199. Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html

  200. Creating interactive reports and dashboards

  201. voilà •Initial Release: 2018 •GitHub Stars: 4.1k 🌟 •GitHub: https://github.com/voila-dashboards/voila

  202. What is it? voilà

  203. A tool for serving Jupyter Notebooks as clean, stand-alone web

    applications voilà
  204. What do I have to do to use it? voilà

  205. Not much! voilà

  206. Setup • pip install voila or conda install voila -c

    conda-forge voilà
  207. Setup • pip install voila or conda install voila -c

    conda-forge • To serve a single notebook: voila my_notbook.ipynb voilà
  208. Setup • pip install voila or conda install voila -c

    conda-forge • To serve a single notebook: voila my_notbook.ipynb • To serve a whole directory of notebooks: voila voilà
  209. Setup • pip install voila or conda install voila -c

    conda-forge • To serve a single notebook: voila my_notbook.ipynb • To serve a whole directory of notebooks: voila • Optionally specify a custom template: voilà
  210. Setup • pip install voila or conda install voila -c

    conda-forge • To serve a single notebook: voila my_notbook.ipynb • To serve a whole directory of notebooks: voila • Optionally specify a custom template: • voila my_notebook.ipynb --template=gridstack voilà
  211. What's in it for me? voilà

  212. Execute and serve Jupyter Notebooks for end users voilà

  213. Source: https://github.com/sysuin/covid-19-world-dashboard

  214. Source: https://github.com/sysuin/covid-19-world-dashboard

  215. Source: https://github.com/sysuin/covid-19-world-dashboard

  216. Interactive plots and widgets still work voilà

  217. Source: https://github.com/dhaitz/machine-learning-interactive-visualization

  218. Source: https://github.com/dhaitz/machine-learning-interactive-visualization

  219. Customize the look and feel of your dashboard with templates

    voilà
  220. voilà Source: https://github.com/voila-dashboards/voila-vuetify

  221. voilà Source: https://github.com/voila-dashboards/voila-vuetify

  222. Long-running notebooks voilà ⚠

  223. So, where does this leave us?

  224. A smoother path to production for work that starts in

    Jupyter Notebooks
  225. • Software Libraries → nbdev projects • Data Transformation Work

    fl ows → ploomber pipelines • Reports and Dashboards → voilà dashboards
  226. Data science teams can own a project end-to-end in a

    tool and environment they're already comfortable with
  227. Jupyter Notebooks become production artifacts

  228. We can retain the interactivity and computational narrative strengths of

    Jupyter Notebooks, even in production settings
  229. Where to go from here?

  230. Jupyter in Production Data Theoretic