Slide 1

Slide 1 text

Jupyter in Production

Slide 2

Slide 2 text

whois

Slide 3

Slide 3 text

whois Patrick Harrison

Slide 4

Slide 4 text

whois Patrick Harrison Data Theoretic

Slide 5

Slide 5 text

whois Patrick Harrison Data Theoretic Previously: Led AI Engineering at a major fi nancial data company

Slide 6

Slide 6 text

Source: https://ipython.org/ipython-doc/rel-0.12/whatsnew/version0.12.html

Slide 7

Slide 7 text

Jupyter Notebooks just turned ten years old Source: https://ipython.org/ipython-doc/rel-0.12/whatsnew/version0.12.html

Slide 8

Slide 8 text

Jupyter Notebooks just turned ten years old The original IPython Notebook was fi rst released on December 19, 2011 Source: https://ipython.org/ipython-doc/rel-0.12/whatsnew/version0.12.html

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

Source: https://github.com/parente/nbestimate/blob/master/estimate.ipynb Public Jupyter Notebooks on GitHub

Slide 11

Slide 11 text

Source: https://github.com/parente/nbestimate/blob/master/estimate.ipynb Public Jupyter Notebooks on GitHub ≈0

Slide 12

Slide 12 text

Source: https://github.com/parente/nbestimate/blob/master/estimate.ipynb Public Jupyter Notebooks on GitHub ≈0 ≈10,000,000

Slide 13

Slide 13 text

8,000+ new public Jupyter Notebooks posted on GitHub every day in 2022, on average Source: https://github.com/parente/nbestimate/blob/master/ipynb_counts.csv

Slide 14

Slide 14 text

Jupyter Notebooks have been used to do some amazing things

Slide 15

Slide 15 text

Source: https://blog.jupyter.org/congratulations-to-the-ligo-and-virgo-collaborations-from-project-jupyter-5923247be019 On behalf of the entire Project Jupyter team, we’d like to say congratulations to Rainer Weiss, Barry C. Barish, Kip S. Thorne and the rest of the LIGO and VIRGO teams for the Nobel Prize in Physics 2017. Since 2015, the LIGO and VIRGO Collaborations have observed multiple instances of gravitational waves due to colliding black holes (and more recently neutron stars). These observations represent decades of work and confirm what Einstein had theorized a hundred years ago. ... To communicate to the broader community, the LIGO/VIRGO Collaboration has created tutorials with Jupyter Notebooks that describe how to use LIGO/ VIRGO data and reproduce analyses related to their academic publications.

Slide 16

Slide 16 text

Source: https://blog.jupyter.org/jupyter-receives-the-acm-software-system-award-d433b0dfe3a2 It is our pleasure to announce that Project Jupyter has been awarded the 2017 ACM Software System Award, a significant honor for the project. We are humbled to join an illustrious list of projects that contains major highlights of computing history, including Unix, TeX, S (R’s predecessor), the Web, Mosaic, Java, INGRES (modern databases) and more.

Slide 17

Slide 17 text

Jupyter Notebooks have some compelling strengths

Slide 18

Slide 18 text

#1

Slide 19

Slide 19 text

Interactive, exploratory programming with immediate feedback #1

Slide 20

Slide 20 text

#2

Slide 21

Slide 21 text

Build a computational narrative bringing together code, results, explanatory prose, plots, images, widgets, and more in a single, human-friendly document #2

Slide 22

Slide 22 text

Lower barriers to entry

Slide 23

Slide 23 text

...many more people and roles can access, use, and collaborate on programming and data analysis in their work Lower barriers to entry

Slide 24

Slide 24 text

Increased productivity

Slide 25

Slide 25 text

Increased productivity ...for programmers of all skill levels

Slide 26

Slide 26 text

"We’ve found that we’re 2x-3x more productive using [notebook-based development] than using traditional programming tools... Source: https://www.fast.ai/2019/12/02/nbdev/

Slide 27

Slide 27 text

"We’ve found that we’re 2x-3x more productive using [notebook-based development] than using traditional programming tools... ...this is a big surprise, since I have coded nearly every day for over 30 years, and in that time have tried dozens of tools, libraries, and systems for building programs." Source: https://www.fast.ai/2019/12/02/nbdev/

Slide 28

Slide 28 text

"We’ve found that we’re 2x-3x more productive using [notebook-based development] than using traditional programming tools... ...this is a big surprise, since I have coded nearly every day for over 30 years, and in that time have tried dozens of tools, libraries, and systems for building programs." Source: https://www.fast.ai/2019/12/02/nbdev/ — Jeremy Howard, fast.ai

Slide 29

Slide 29 text

Jupyter Notebooks have become an essential part of the data scientist's toolkit

Slide 30

Slide 30 text

But, a story you've probably heard before...

Slide 31

Slide 31 text

The magic words...

Slide 32

Slide 32 text

"Let's put this in production" The magic words...

Slide 33

Slide 33 text

Slide 34

Slide 34 text

"You can't use Jupyter Notebooks in production"

Slide 35

Slide 35 text

Why not?

Slide 36

Slide 36 text

"It's not supported."

Slide 37

Slide 37 text

This is a pain to version control.

Slide 38

Slide 38 text

This is a pain to version control. This is monolithic. How will we collaborate effectively?

Slide 39

Slide 39 text

This is a pain to version control. This is monolithic. How will we collaborate effectively? How can we share and reuse this code?

Slide 40

Slide 40 text

This is a pain to version control. This is monolithic. How will we collaborate effectively? How can we share and reuse this code? How do we apply our code quality standards?

Slide 41

Slide 41 text

This is a pain to version control. This is monolithic. How will we collaborate effectively? How can we share and reuse this code? How do we apply our code quality standards? How do we test this code?

Slide 42

Slide 42 text

This is a pain to version control. This is monolithic. How will we collaborate effectively? How can we share and reuse this code? How do we apply our code quality standards? How do we test this code? Will this work with our continuous integration system?

Slide 43

Slide 43 text

This is a pain to version control. This is monolithic. How will we collaborate effectively? How can we share and reuse this code? How do we apply our code quality standards? How do we test this code? Will this work with our continuous integration system? How do we schedule and trigger automatic execution?

Slide 44

Slide 44 text

This is a pain to version control. This is monolithic. How will we collaborate effectively? How can we share and reuse this code? How do we apply our code quality standards? How do we test this code? Will this work with our continuous integration system? How do we schedule and trigger automatic execution? Out-of-order cell execution!

Slide 45

Slide 45 text

This is a pain to version control. This is monolithic. How will we collaborate effectively? How can we share and reuse this code? How do we apply our code quality standards? How do we test this code? Will this work with our continuous integration system? How do we schedule and trigger automatic execution? Out-of-order cell execution! ...

Slide 46

Slide 46 text

OK, how should we get this work into production?

Slide 47

Slide 47 text

OK, how should we get this work into production? “It looks like there's a lot going on in your notebook…"

Slide 48

Slide 48 text

Your notebook has reusable code... How should we get this work into production?

Slide 49

Slide 49 text

Your notebook has reusable code... ... you're going to need to reimplement this code as proper software libraries, How should we get this work into production?

Slide 50

Slide 50 text

Your notebook has reusable code... ... you're going to need to reimplement this code as proper software libraries, ... subject to our company-wide software engineering standards, How should we get this work into production?

Slide 51

Slide 51 text

Your notebook has reusable code... ... you're going to need to reimplement this code as proper software libraries, ... subject to our company-wide software engineering standards, ... with reimplemented tests using our company's preferred testing framework, How should we get this work into production?

Slide 52

Slide 52 text

Your notebook has reusable code... ... you're going to need to reimplement this code as proper software libraries, ... subject to our company-wide software engineering standards, ... with reimplemented tests using our company's preferred testing framework, ... using our preferred enterprise continuous integration system, How should we get this work into production?

Slide 53

Slide 53 text

Your notebook has reusable code... ... you're going to need to reimplement this code as proper software libraries, ... subject to our company-wide software engineering standards, ... with reimplemented tests using our company's preferred testing framework, ... using our preferred enterprise continuous integration system, ... and deploy to our preferred enterprise artifact repository. How should we get this work into production?

Slide 54

Slide 54 text

Your notebook is accessing and transforming data... How should we get this work into production?

Slide 55

Slide 55 text

Your notebook is accessing and transforming data... ... you're going to need to reimplement this logic as data pipelines in our preferred enterprise data pipeline framework, How should we get this work into production?

Slide 56

Slide 56 text

Your notebook is accessing and transforming data... ... you're going to need to reimplement this logic as data pipelines in our preferred enterprise data pipeline framework, ... which has its own engineering practices and conventions, How should we get this work into production?

Slide 57

Slide 57 text

Your notebook is accessing and transforming data... ... you're going to need to reimplement this logic as data pipelines in our preferred enterprise data pipeline framework, ... which has its own engineering practices and conventions, ... and may not even use the same programming language. How should we get this work into production?

Slide 58

Slide 58 text

Your notebook generates predictions... How should we get this work into production?

Slide 59

Slide 59 text

Your notebook generates predictions... ... you're going to need to reimplement the model as a web service, How should we get this work into production?

Slide 60

Slide 60 text

Your notebook generates predictions... ... you're going to need to reimplement the model as a web service, ... wrap it in a Docker container, How should we get this work into production?

Slide 61

Slide 61 text

Your notebook generates predictions... ... you're going to need to reimplement the model as a web service, ... wrap it in a Docker container, ... store it in our preferred enterprise container registry, How should we get this work into production?

Slide 62

Slide 62 text

Your notebook generates predictions... ... you're going to need to reimplement the model as a web service, ... wrap it in a Docker container, ... store it in our preferred enterprise container registry, ... and deploy it to our preferred enterprise container orchestration platform. How should we get this work into production?

Slide 63

Slide 63 text

Your notebook presents results to end users... How should we get this work into production?

Slide 64

Slide 64 text

Your notebook presents results to end users... ... you're going to need to reimplement these reports in our preferred enterprise business intelligence platform, How should we get this work into production?

Slide 65

Slide 65 text

Your notebook presents results to end users... ... you're going to need to reimplement these reports in our preferred enterprise business intelligence platform, ... which has its own engineering practices and conventions, How should we get this work into production?

Slide 66

Slide 66 text

Your notebook presents results to end users... ... you're going to need to reimplement these reports in our preferred enterprise business intelligence platform, ... which has its own engineering practices and conventions, ... and may not even use the same programming language. How should we get this work into production?

Slide 67

Slide 67 text

So you're telling me that if we're going to get our work in production, either:

Slide 68

Slide 68 text

So you're telling me that if we're going to get our work in production, either: 1. Our data science teams have to be stacked with unicorns,

Slide 69

Slide 69 text

So you're telling me that if we're going to get our work in production, either: 1. Our data science teams have to be stacked with unicorns, or

Slide 70

Slide 70 text

So you're telling me that if we're going to get our work in production, either: 1. Our data science teams have to be stacked with unicorns, or 2. We have to loop in a bunch of other teams and create dependencies between them

Slide 71

Slide 71 text

My teams went through this process so many times we had a name for it

Slide 72

Slide 72 text

de • notebook • i fi cation

Slide 73

Slide 73 text

de • notebook • i fi cation The long, painful process of exploding a Jupyter Notebook that de fi nitely works into a constellation of disparate production artifacts that maybe don't

Slide 74

Slide 74 text

⚠ WARNING: De-notebook-i fi cation has been shown to have side effects including increased complexity, elongated timelines, unhappy stakeholders, frustrated data scientists, increased risk of project cancelation, and loss of data science team credibility.

Slide 75

Slide 75 text

Additional problem:

Slide 76

Slide 76 text

Additional problem: If Jupyter is only for demos and prototypes...

Slide 77

Slide 77 text

Additional problem: If Jupyter is only for demos and prototypes... Why bother writing good code in notebooks?

Slide 78

Slide 78 text

"Maybe you shouldn't use Jupyter in the fi rst place"

Slide 79

Slide 79 text

"Maybe you shouldn't use Jupyter in the fi rst place" There has to be a better answer

Slide 80

Slide 80 text

enter the Jupyter in Production ecosystem

Slide 81

Slide 81 text

But fi rst... what does in production mean, anyway?

Slide 82

Slide 82 text

For this talk, we'll focus on: What does in production mean, anyway?

Slide 83

Slide 83 text

For this talk, we'll focus on: •Developing and distributing software libraries What does in production mean, anyway?

Slide 84

Slide 84 text

For this talk, we'll focus on: •Developing and distributing software libraries •Building and running data pipelines What does in production mean, anyway?

Slide 85

Slide 85 text

For this talk, we'll focus on: •Developing and distributing software libraries •Building and running data pipelines •Creating interactive reports and dashboards What does in production mean, anyway?

Slide 86

Slide 86 text

For each of these tools, I'll try to answer...

Slide 87

Slide 87 text

... what is it? For each of these tools, I'll try to answer...

Slide 88

Slide 88 text

... what is it? ... what do I have to do to use it? For each of these tools, I'll try to answer...

Slide 89

Slide 89 text

... what is it? ... what do I have to do to use it? ... what's in it for me? For each of these tools, I'll try to answer...

Slide 90

Slide 90 text

Developing and distributing software libraries

Slide 91

Slide 91 text

nbdev •Initial Release: 2019 •GitHub Stars: 3.2k 🌟 •GitHub: https://github.com/fastai/nbdev/

Slide 92

Slide 92 text

What is it? nbdev

Slide 93

Slide 93 text

A collection of tools that let you use Jupyter Notebooks as the source code for Python software libraries nbdev

Slide 94

Slide 94 text

What do I have to do to use it? nbdev

Slide 95

Slide 95 text

Setup • pip install nbdev or conda install nbdev -c fastai nbdev

Slide 96

Slide 96 text

Setup • pip install nbdev or conda install nbdev -c fastai • Initialize your git repository as an nbdev project: nbdev_new 
 (Or, copy the of fi cial nbdev template repo on GitHub) nbdev

Slide 97

Slide 97 text

Setup • pip install nbdev or conda install nbdev -c fastai • Initialize your git repository as an nbdev project: nbdev_new 
 (Or, copy the of fi cial nbdev template repo on GitHub) • Install the nbdev git hooks: nbdev_install_git_hooks nbdev

Slide 98

Slide 98 text

Setup • pip install nbdev or conda install nbdev -c fastai • Initialize your git repository as an nbdev project: nbdev_new 
 (Or, copy the of fi cial nbdev template repo on GitHub) • Install the nbdev git hooks: nbdev_install_git_hooks • Enter some basic project information in settings.ini nbdev

Slide 99

Slide 99 text

Basic Usage • Start with exploratory programming in Jupyter Notebooks, as usual nbdev

Slide 100

Slide 100 text

Basic Usage • Start with exploratory programming in Jupyter Notebooks, as usual • As you go, notice when it would make sense to reuse or share bits of the code you write nbdev

Slide 101

Slide 101 text

Basic Usage • Start with exploratory programming in Jupyter Notebooks, as usual • As you go, notice when it would make sense to reuse or share bits of the code you write • Reshape this code into functions and classes in a notebook nbdev

Slide 102

Slide 102 text

Basic Usage • Start with exploratory programming in Jupyter Notebooks, as usual • As you go, notice when it would make sense to reuse or share bits of the code you write • Reshape this code into functions and classes in a notebook • Add the #export fl ag (code comment) at the start of your main code cells nbdev

Slide 103

Slide 103 text

Basic Usage • Start with exploratory programming in Jupyter Notebooks, as usual • As you go, notice when it would make sense to reuse or share bits of the code you write • Reshape this code into functions and classes in a notebook • Add the #export fl ag (code comment) at the start of your main code cells • Next to your main code cells, add rich explanatory text, images, code usage examples, sample output, and assert statements nbdev

Slide 104

Slide 104 text

Source: https://nbdev.fast.ai/example.html

Slide 105

Slide 105 text

Source: https://nbdev.fast.ai/example.html

Slide 106

Slide 106 text

Source: https://nbdev.fast.ai/example.html

Slide 107

Slide 107 text

Source: https://nbdev.fast.ai/example.html

Slide 108

Slide 108 text

Source: https://nbdev.fast.ai/example.html

Slide 109

Slide 109 text

What's in it for me? nbdev

Slide 110

Slide 110 text

Quite a bit, actually. nbdev

Slide 111

Slide 111 text

Automatically export the code from your Jupyter Notebooks into a fully-functional Python package: nbdev nbdev_build_lib

Slide 112

Slide 112 text

Source: https://nbdev.fast.ai/example.html

Slide 113

Slide 113 text

Automatically publish new releases of your package to PyPI and conda: nbdev make release

Slide 114

Slide 114 text

Automatically generate a rich documentation website for your package from your Jupyter Notebooks: nbdev nbdev_build_docs

Slide 115

Slide 115 text

Source: https://nbdev.fast.ai/example.html

Slide 116

Slide 116 text

Avoid common version control con fl icts and resolving them when they occur: nbdev nbdev_clean_nbs & nbdev_fix_merge

Slide 117

Slide 117 text

Source: https://nbdev.fast.ai/merge.html

Slide 118

Slide 118 text

Automatically run tests on your notebooks: nbdev nbdev_test_nbs

Slide 119

Slide 119 text

Source: https://nbdev.fast.ai/example.html

Slide 120

Slide 120 text

nbdev $ nbdev_test_nbs testing: card.ipynb testing: deck.ipynb All tests are passing! Source: https://nbdev.fast.ai/tutorial.html

Slide 121

Slide 121 text

Continuous integration out-of-the-box with git hooks and GitHub Actions nbdev

Slide 122

Slide 122 text

Conceptual shift nbdev ⚠

Slide 123

Slide 123 text

No content

Slide 124

Slide 124 text

With nbdev, your source code, tests, and documentation all live together in one place nbdev

Slide 125

Slide 125 text

Source: https://nbdev.fast.ai/example.html Code

Slide 126

Slide 126 text

Source: https://nbdev.fast.ai/example.html Code Docs Docs

Slide 127

Slide 127 text

Source: https://nbdev.fast.ai/example.html Code Tests Docs Docs

Slide 128

Slide 128 text

"The magic of nbdev is that it doesn’t actually change programming that much; you add a #export or #hide tag to your notebook cells once in a while, and you run nbdev_build_lib and nbdev_build_docs when you fi nish up your code. 
 Source: https://www.overstory.com/blog/how-nbdev-helps-us-structure-our-data-science-work fl ow-in-jupyter-notebooks nbdev

Slide 129

Slide 129 text

"The magic of nbdev is that it doesn’t actually change programming that much; you add a #export or #hide tag to your notebook cells once in a while, and you run nbdev_build_lib and nbdev_build_docs when you fi nish up your code. 
 That’s it! There’s nothing new to learn, nothing to unlearn. It’s just notebooks." Source: https://www.overstory.com/blog/how-nbdev-helps-us-structure-our-data-science-work fl ow-in-jupyter-notebooks nbdev

Slide 130

Slide 130 text

“[nbdev] incentives us to write clear code, use proper Git version control and document and test our codebase continuously... [while] preserving the bene fi ts of having interactive Jupyter notebooks in which it is easy to experiment." Source: https://www.overstory.com/blog/how-nbdev-helps-us-structure-our-data-science-work fl ow-in-jupyter-notebooks nbdev

Slide 131

Slide 131 text

“[nbdev] incentives us to write clear code, use proper Git version control and document and test our codebase continuously... [while] preserving the bene fi ts of having interactive Jupyter notebooks in which it is easy to experiment." Source: https://www.overstory.com/blog/how-nbdev-helps-us-structure-our-data-science-work fl ow-in-jupyter-notebooks nbdev — Overstory

Slide 132

Slide 132 text

Bonus Picks

Slide 133

Slide 133 text

Visually compare notebook versions

Slide 134

Slide 134 text

nbdime and ReviewNB Visually compare notebook versions

Slide 135

Slide 135 text

Source: https://nbdime.readthedocs.io

Slide 136

Slide 136 text

Source: https://www.reviewnb.com/

Slide 137

Slide 137 text

Run your favorite code quality tools on notebooks

Slide 138

Slide 138 text

nbQA Run your favorite code quality tools on notebooks

Slide 139

Slide 139 text

$ nbqa black my_notebook.ipynb reformatted my_notebook.ipynb All done! ✨ 🍰 ✨ 1 files reformatted. Source: https://nbqa.readthedocs.io/en/latest/examples.html nbQA

Slide 140

Slide 140 text

Building and running data pipelines

Slide 141

Slide 141 text

Source: https://docs.ploomber.io/en/latest/use-cases/ml.html

Slide 142

Slide 142 text

“We’re currently in the process of migrating all 10,000 of the scheduled jobs running on the Net fl ix Data Platform to use notebook-based execution… 
 Source: https://net fl ixtechblog.com/scheduling-notebooks-348e6c14cfd6

Slide 143

Slide 143 text

“We’re currently in the process of migrating all 10,000 of the scheduled jobs running on the Net fl ix Data Platform to use notebook-based execution… 
 When we’re done, more than 150,000 [pipeline executions] will be running through notebooks on our platform every single day.” Source: https://net fl ixtechblog.com/scheduling-notebooks-348e6c14cfd6

Slide 144

Slide 144 text

“We’re currently in the process of migrating all 10,000 of the scheduled jobs running on the Net fl ix Data Platform to use notebook-based execution… 
 When we’re done, more than 150,000 [pipeline executions] will be running through notebooks on our platform every single day.” Source: https://net fl ixtechblog.com/scheduling-notebooks-348e6c14cfd6 — Net fl ix (2018)

Slide 145

Slide 145 text

ploomber •Initial Release: 2020 •GitHub Stars: 2.3k 🌟 •GitHub: https://github.com/ploomber/ploomber

Slide 146

Slide 146 text

What is it? ploomber

Slide 147

Slide 147 text

A framework to build and execute data pipelines made out of Jupyter Notebooks ploomber

Slide 148

Slide 148 text

What do I have to do to use it? ploomber

Slide 149

Slide 149 text

Setup • pip install ploomber or 
 conda install ploomber -c conda-forge ploomber

Slide 150

Slide 150 text

Setup • pip install ploomber or 
 conda install ploomber -c conda-forge • Initialize your git repository as a ploomber project: ploomber

Slide 151

Slide 151 text

Setup • pip install ploomber or 
 conda install ploomber -c conda-forge • Initialize your git repository as a ploomber project: • ploomber scaffold --empty ploomber

Slide 152

Slide 152 text

Setup • pip install ploomber or 
 conda install ploomber -c conda-forge • Initialize your git repository as a ploomber project: • ploomber scaffold --empty • Add information about your pipeline to pipeline.yaml as you go ploomber

Slide 153

Slide 153 text

Basic Usage • Start with exploratory programming in Jupyter Notebooks, as usual ploomber

Slide 154

Slide 154 text

Basic Usage • Start with exploratory programming in Jupyter Notebooks, as usual • As you go, notice when chunks of your code would make sense as modular "tasks" in a data transformation work fl ow ploomber

Slide 155

Slide 155 text

Basic Usage • Start with exploratory programming in Jupyter Notebooks, as usual • As you go, notice when chunks of your code would make sense as modular "tasks" in a data transformation work fl ow • Move the code for each task into its own dedicated notebook ploomber

Slide 156

Slide 156 text

Basic Usage • Start with exploratory programming in Jupyter Notebooks, as usual • As you go, notice when chunks of your code would make sense as modular "tasks" in a data transformation work fl ow • Move the code for each task into its own dedicated notebook • Next to your code cells, add rich explanatory text, images, example expected output, and data quality checks ploomber

Slide 157

Slide 157 text

Basic Usage • Record information about your task notebooks in pipeline.yaml ploomber

Slide 158

Slide 158 text

Basic Usage • Record information about your task notebooks in pipeline.yaml • Add a few variables to your task notebooks to de fi ne upstream dependencies ploomber

Slide 159

Slide 159 text

Basic Usage • Record information about your task notebooks in pipeline.yaml • Add a few variables to your task notebooks to de fi ne upstream dependencies • Run your pipeline with ploomber build ploomber

Slide 160

Slide 160 text

Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html

Slide 161

Slide 161 text

Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html .ipynb

Slide 162

Slide 162 text

Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html .ipynb .ipynb .ipynb

Slide 163

Slide 163 text

pipeline.yaml ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 164

Slide 164 text

pipeline.yaml tasks: ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 165

Slide 165 text

pipeline.yaml tasks: # source is the code you want to execute 
 - source: raw.ipynb ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 166

Slide 166 text

pipeline.yaml tasks: # source is the code you want to execute 
 - source: raw.ipynb # products are task's outputs 
 product: ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 167

Slide 167 text

pipeline.yaml tasks: # source is the code you want to execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 168

Slide 168 text

pipeline.yaml tasks: # source is the code you want to execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb # you can define as many outputs as you want 
 data: output/raw_data.csv 
 ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 169

Slide 169 text

pipeline.yaml tasks: # source is the code you want to execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb # you can define as many outputs as you want 
 data: output/raw_data.csv 
 - source: clean.ipynb ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 170

Slide 170 text

pipeline.yaml tasks: # source is the code you want to execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb # you can define as many outputs as you want 
 data: output/raw_data.csv 
 - source: clean.ipynb product: ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 171

Slide 171 text

pipeline.yaml tasks: # source is the code you want to execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb # you can define as many outputs as you want 
 data: output/raw_data.csv 
 - source: clean.ipynb product: nb: output/clean.ipynb ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 172

Slide 172 text

pipeline.yaml tasks: # source is the code you want to execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb # you can define as many outputs as you want 
 data: output/raw_data.csv 
 - source: clean.ipynb product: nb: output/clean.ipynb data: output/clean_data.parquet 
 ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 173

Slide 173 text

pipeline.yaml tasks: # source is the code you want to execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb # you can define as many outputs as you want 
 data: output/raw_data.csv 
 - source: clean.ipynb product: nb: output/clean.ipynb data: output/clean_data.parquet 
 - source: plot.ipynb ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 174

Slide 174 text

pipeline.yaml tasks: # source is the code you want to execute 
 - source: raw.ipynb # products are task's outputs 
 product: # tasks generate executed notebooks as outputs 
 nb: output/raw.ipynb # you can define as many outputs as you want 
 data: output/raw_data.csv 
 - source: clean.ipynb product: nb: output/clean.ipynb data: output/clean_data.parquet 
 - source: plot.ipynb product: output/plot.ipynb ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 175

Slide 175 text

Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html

Slide 176

Slide 176 text

Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html

Slide 177

Slide 177 text

Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html

Slide 178

Slide 178 text

$ ploomber build ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 179

Slide 179 text

$ ploomber build Building task ‘raw': 0%| | 0/5 [00:00, ?it/s] 
 Executing: 0%| | 0/6 [00:00, ?cell/s] 
 Executing: 17%|█▋ | 1/6 [00:04<00:21, 4.25s/cell] 
 Executing: 33%|███▎ | 2/6 [00:04<00:07, 1.82s/cell] 
 Executing: 100%|██████████| 6/6 [00:05<00:00, 1.11cell/s] ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 180

Slide 180 text

$ ploomber build Building task ‘raw': 0%| | 0/5 [00:00, ?it/s] 
 Executing: 0%| | 0/6 [00:00, ?cell/s] 
 Executing: 17%|█▋ | 1/6 [00:04<00:21, 4.25s/cell] 
 Executing: 33%|███▎ | 2/6 [00:04<00:07, 1.82s/cell] 
 Executing: 100%|██████████| 6/6 [00:05<00:00, 1.11cell/s] Building task 'clean': 20%|██ | 1/5 [00:05<00:21, 5.47s/it] 
 Executing: 0%| | 0/7 [00:00, ?cell/s] 
 Executing: 14%|█▍ | 1/7 [00:01<00:10, 1.76s/cell] 
 Executing: 43%|████▎ | 3/7 [00:23<00:34, 8.63s/cell] 
 Executing: 71%|███████▏ | 5/7 [00:25<00:09, 4.69s/cell] 
 Executing: 86%|████████▌ | 6/7 [00:28<00:04, 4.14s/cell] 
 Executing: 100%|██████████| 7/7 [00:29<00:00, 4.24s/cell] ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 181

Slide 181 text

$ ploomber build Building task ‘raw': 0%| | 0/5 [00:00, ?it/s] 
 Executing: 0%| | 0/6 [00:00, ?cell/s] 
 Executing: 17%|█▋ | 1/6 [00:04<00:21, 4.25s/cell] 
 Executing: 33%|███▎ | 2/6 [00:04<00:07, 1.82s/cell] 
 Executing: 100%|██████████| 6/6 [00:05<00:00, 1.11cell/s] Building task 'clean': 20%|██ | 1/5 [00:05<00:21, 5.47s/it] 
 Executing: 0%| | 0/7 [00:00, ?cell/s] 
 Executing: 14%|█▍ | 1/7 [00:01<00:10, 1.76s/cell] 
 Executing: 43%|████▎ | 3/7 [00:23<00:34, 8.63s/cell] 
 Executing: 71%|███████▏ | 5/7 [00:25<00:09, 4.69s/cell] 
 Executing: 86%|████████▌ | 6/7 [00:28<00:04, 4.14s/cell] 
 Executing: 100%|██████████| 7/7 [00:29<00:00, 4.24s/cell] Building task ‘plot': 40%|████ | 2/5 [00:35<00:59, 19.75s/it] 
 Executing: 0%| | 0/9 [00:00, ?cell/s] 
 Executing: 11%|█ | 1/9 [00:02<00:22, 2.80s/cell] 
 Executing: 33%|███▎ | 3/9 [00:02<00:04, 1.28cell/s] 
 Executing: 56%|█████▌ | 5/9 [00:03<00:01, 2.42cell/s] 
 Executing: 100%|██████████| 9/9 [00:03<00:00, 2.26cell/s] ploomber Source: https://docs.ploomber.io/en/latest/get-started/ fi rst-pipeline.html

Slide 182

Slide 182 text

What's in it for me? ploomber

Slide 183

Slide 183 text

A human-friendly computational narrative of every pipeline execution ploomber

Slide 184

Slide 184 text

“[W]e’ve gained a key improvement over a non-notebook execution pattern: our input and outputs are complete documents, wholly executable and shareable in the same interface.” Source: https://net fl ixtechblog.com/scheduling-notebooks-348e6c14cfd6 — Net fl ix (2018)

Slide 185

Slide 185 text

Interactive pipeline inspection and debugging in Jupyter Notebooks ploomber

Slide 186

Slide 186 text

“Say something went wrong… How might we debug and fi x the issue? The fi rst place we’d want to look is the notebook output. It will have a stack trace, and ultimately any output information related to an error… 
 Source: https://net fl ixtechblog.com/scheduling-notebooks-348e6c14cfd6 — Net fl ix (2018)

Slide 187

Slide 187 text

“Say something went wrong… How might we debug and fi x the issue? The fi rst place we’d want to look is the notebook output. It will have a stack trace, and ultimately any output information related to an error… 
 [W]e simply take the output notebook with our exact failed runtime parameterizations and load it into a notebook server… With a few iterations… we can quickly fi nd a fi x for the failure. Source: https://net fl ixtechblog.com/scheduling-notebooks-348e6c14cfd6 — Net fl ix (2018)

Slide 188

Slide 188 text

Incremental builds ploomber

Slide 189

Slide 189 text

Source: https://docs.ploomber.io/en/latest/use-cases/ml.html

Slide 190

Slide 190 text

Source: https://docs.ploomber.io/en/latest/use-cases/ml.html

Slide 191

Slide 191 text

Test each stage of your data pipeline ploomber

Slide 192

Slide 192 text

Modular pipelines → collaborative development ploomber

Slide 193

Slide 193 text

Source: https://docs.ploomber.io/en/latest/use-cases/ml.html 👩💻

Slide 194

Slide 194 text

Source: https://docs.ploomber.io/en/latest/use-cases/ml.html 👨💻 👩💻

Slide 195

Slide 195 text

Source: https://docs.ploomber.io/en/latest/use-cases/ml.html 👩💻 👨💻 🧑💻

Slide 196

Slide 196 text

Automated deployment 
 to Air fl ow, AWS Batch, or Kubernetes ploomber

Slide 197

Slide 197 text

Bonus Pick

Slide 198

Slide 198 text

Store Jupyter Notebooks as plain text 
 for easier version control

Slide 199

Slide 199 text

jupytext Store Jupyter Notebooks as plain text 
 for easier version control

Slide 200

Slide 200 text

Source: https://docs.ploomber.io/en/latest/get-started/basic-concepts.html

Slide 201

Slide 201 text

Creating interactive reports and dashboards

Slide 202

Slide 202 text

voilà •Initial Release: 2018 •GitHub Stars: 4.1k 🌟 •GitHub: https://github.com/voila-dashboards/voila

Slide 203

Slide 203 text

What is it? voilà

Slide 204

Slide 204 text

A tool for serving Jupyter Notebooks as clean, stand-alone web applications voilà

Slide 205

Slide 205 text

What do I have to do to use it? voilà

Slide 206

Slide 206 text

Not much! voilà

Slide 207

Slide 207 text

Setup • pip install voila or conda install voila -c conda-forge voilà

Slide 208

Slide 208 text

Setup • pip install voila or conda install voila -c conda-forge • To serve a single notebook: voila my_notbook.ipynb voilà

Slide 209

Slide 209 text

Setup • pip install voila or conda install voila -c conda-forge • To serve a single notebook: voila my_notbook.ipynb • To serve a whole directory of notebooks: voila voilà

Slide 210

Slide 210 text

Setup • pip install voila or conda install voila -c conda-forge • To serve a single notebook: voila my_notbook.ipynb • To serve a whole directory of notebooks: voila • Optionally specify a custom template: voilà

Slide 211

Slide 211 text

Setup • pip install voila or conda install voila -c conda-forge • To serve a single notebook: voila my_notbook.ipynb • To serve a whole directory of notebooks: voila • Optionally specify a custom template: • voila my_notebook.ipynb --template=gridstack voilà

Slide 212

Slide 212 text

What's in it for me? voilà

Slide 213

Slide 213 text

Execute and serve Jupyter Notebooks for end users voilà

Slide 214

Slide 214 text

Source: https://github.com/sysuin/covid-19-world-dashboard

Slide 215

Slide 215 text

Source: https://github.com/sysuin/covid-19-world-dashboard

Slide 216

Slide 216 text

Source: https://github.com/sysuin/covid-19-world-dashboard

Slide 217

Slide 217 text

Interactive plots and widgets still work voilà

Slide 218

Slide 218 text

Source: https://github.com/dhaitz/machine-learning-interactive-visualization

Slide 219

Slide 219 text

Source: https://github.com/dhaitz/machine-learning-interactive-visualization

Slide 220

Slide 220 text

Customize the look and feel of your dashboard with templates voilà

Slide 221

Slide 221 text

voilà Source: https://github.com/voila-dashboards/voila-vuetify

Slide 222

Slide 222 text

voilà Source: https://github.com/voila-dashboards/voila-vuetify

Slide 223

Slide 223 text

Long-running notebooks voilà ⚠

Slide 224

Slide 224 text

So, where does this leave us?

Slide 225

Slide 225 text

A smoother path to production for work that starts in Jupyter Notebooks

Slide 226

Slide 226 text

• Software Libraries → nbdev projects • Data Transformation Work fl ows → ploomber pipelines • Reports and Dashboards → voilà dashboards

Slide 227

Slide 227 text

Data science teams can own a project end-to-end in a tool and environment they're already comfortable with

Slide 228

Slide 228 text

Jupyter Notebooks become production artifacts

Slide 229

Slide 229 text

We can retain the interactivity and computational narrative strengths of Jupyter Notebooks, even in production settings

Slide 230

Slide 230 text

Where to go from here?

Slide 231

Slide 231 text

Jupyter in Production Data Theoretic