Slide 1

Slide 1 text

Experimentation with jupyter, papermill, and mlflow Shadab Hussain Data Scientist https://shadabhussain.com/

Slide 2

Slide 2 text

A tool for data science at scale • Rich web client (HTML, images, videos, LaTeX) • Code (40+ programming languages) • Results • Share (email, Dropbox, GitHub) • Reproduce https://jupyter.org/ https://towardsdatascience.com/optimizing-jupyter-notebook-tips-tricks-and-nbextensions-26d75d502663

Slide 3

Slide 3 text

Tool for parameterizing and executing Notebooks https://papermill.readthedocs.io/en/latest/ index.ipynb SSE.ipynb GOLD.ipynb SNP.ipynb NIKKEI.ipynb Input Notebook Output Notebooks parametrize and run

Slide 4

Slide 4 text

Running Papermill Notebook Notebook Sources Database File Services Parameters SNP GOLD SSE HANGSENG NIKKEI Index Runtime Manager Runtime Process source notebook parameter values stream i/o messages execute cells kernel messages input { } Notebook Sinks Database File Services output notebook store

Slide 5

Slide 5 text

It adds notebook isolation. • Immutable inputs • Immutable outputs • Parameterization of notebook runs • Configurable sourcing/sinking Solves our problems for automated execution! How does this change the notebook experience?

Slide 6

Slide 6 text

https://medium.com/faun/mlflow-on-google-cloud-platform-cd8c9b04a2d8 https://mlflow.org/docs/latest/index.html An open source platform to manage the ML lifecycle Tracking Projects Models Registry Record and query experiments: code, data, config, results Packaging Data Science Code for reproducible runs on any platform General format for sending models to diverse deploy tools Store, annotate and manage models in a repository Components of MLflow

Slide 7

Slide 7 text

Tracking Server UI API Notebooks Cloud Jobs Local Apps Tracking APIS

Slide 8

Slide 8 text

Project Spec Code Dependencies Config Data Local Execution Remote Execution

Slide 9

Slide 9 text

Model Format Run Sources Inference Code Batch & Stream Processing Serving Tools Flavor1 Flavor2 Model Flavors

Slide 10

Slide 10 text

Model Registry Downstream Users Automated Jobs Rest Serving Model Developer Reviewers, CI/CD Tools

Slide 11

Slide 11 text

http://bit.ly/Experiment-with-Jupyter-Papermill-and-MLFlow DEMO

Slide 12

Slide 12 text

Useful Links: https://ljvmiranda921.github.io/notebook/2020/03/06/jupyter-notebooks-in-2020/ https://ljvmiranda921.github.io/notebook/2020/03/16/jupyter-notebooks-in-2020-part-2/ https://ljvmiranda921.github.io/notebook/2020/03/30/jupyter-notebooks-in-2020-part-3/ https://towardsdatascience.com/introduction-to-papermill-2c61f66bea30 https://pbpython.com/papermil-rclone-report-1.html https://pbpython.com/papermil-rclone-report-2.html https://medium.com/@jacky308082/intro-to-mlflow-in-python-d22367cbaa97 https://towardsdatascience.com/manage-your-machine-learning-lifecycle-with-mlflow-part-1-a7252c859f72

Slide 13

Slide 13 text

Contact Me LinkedIn linkedin.com/in/shadabhussain96/ Website shadabhussain.com Twitter twitter.com/shadabhusain786

Slide 14

Slide 14 text

THANK YOU!