Slide 1

Slide 1 text

Architecture Board Architecture Board Application Monitoring Service Application Monitoring Service Phase 1 – Metrics Gateway Phase 1 – Metrics Gateway Jürgen Hermann Karlsruhe · 2018-04-24 JupyterHub and Jupyter Notebook JupyterHub and Jupyter Notebook A View Under the Hood A View Under the Hood jhermann jhermann_ [email protected]

Slide 2

Slide 2 text

Agenda Agenda ● The Jupyter Project and its building blocks The Jupyter Project and its building blocks ● Internal magic: protocols, spawners, gateways, … Internal magic: protocols, spawners, gateways, … ● Deploying a local data science platform Deploying a local data science platform ● Simple host-based deployments Simple host-based deployments ● Running Jupyter on Kubernetes Running Jupyter on Kubernetes ● Options for publishing notebooks Options for publishing notebooks ● The Jupyter Project and its building blocks The Jupyter Project and its building blocks ● Internal magic: protocols, spawners, gateways, … Internal magic: protocols, spawners, gateways, … ● Deploying a local data science platform Deploying a local data science platform ● Simple host-based deployments Simple host-based deployments ● Running Jupyter on Kubernetes Running Jupyter on Kubernetes ● Options for publishing notebooks Options for publishing notebooks This talk tries to give you an overview of This talk tries to give you an overview of all the little parts that together make up all the little parts that together make up a Jupyter-powered application, and how a Jupyter-powered application, and how they fit together. they fit together. This talk tries to give you an overview of This talk tries to give you an overview of all the little parts that together make up all the little parts that together make up a Jupyter-powered application, and how a Jupyter-powered application, and how they fit together. they fit together.

Slide 3

Slide 3 text

It‘s Simple & It‘s Simple & Shiny Shiny

Slide 4

Slide 4 text

The ‘Classic’ Notebook User Interface The ‘Classic’ Notebook User Interface Source: Source: https://ipython.org/ https://ipython.org/

Slide 5

Slide 5 text

The Jupyter Promise The Jupyter Promise ● “Data Science IDE” in your browser ● File system view (notebook dashboard) ● Notebook editor with markdown + code cells ● Runtimes for Julia, Python, R, and lots more ● Wide selection of (interactive) visualizations ● Customization with widgets and ‘magics’ ● “Data Science IDE” in your browser ● File system view (notebook dashboard) ● Notebook editor with markdown + code cells ● Runtimes for Julia, Python, R, and lots more ● Wide selection of (interactive) visualizations ● Customization with widgets and ‘magics’

Slide 6

Slide 6 text

The Python The Python Advantage Advantage Scientific / Scientific / Data Science Data Science Stack Stack https://www.slideshare.net/icaromedeiros/why-python-is-better-for-data-science https://www.slideshare.net/icaromedeiros/why-python-is-better-for-data-science

Slide 7

Slide 7 text

PySpark (Local) PySpark (Local)

Slide 8

Slide 8 text

Python‘s Visualization Python‘s Visualization Landscape Landscape https://youtu.be/FytuB8nFHPQ https://youtu.be/FytuB8nFHPQ

Slide 9

Slide 9 text

Seaborn Seaborn

Slide 10

Slide 10 text

Bokeh Bokeh

Slide 11

Slide 11 text

Altair / Vega Altair / Vega

Slide 12

Slide 12 text

It‘s It‘s Complicated Complicated

Slide 13

Slide 13 text

The Jupyter Universe The Jupyter Universe ● Servers (Web UI) ● Interfaces / Clients ● Jupyter API ● IPython Reference Kernel ● Servers (Web UI) ● Interfaces / Clients ● Jupyter API ● IPython Reference Kernel https://jupyter.readthedocs.io/en/latest/architecture/visual_overview.html https://jupyter.readthedocs.io/en/latest/architecture/visual_overview.html

Slide 14

Slide 14 text

Single-User Notebook Server Single-User Notebook Server ● Web UI for a single notebook file ● Talks to a kernel via ØMQ (for terminal I/O) ● Web UI for a single notebook file ● Talks to a kernel via ØMQ (for terminal I/O)

Slide 15

Slide 15 text

JupyterHub – JupyterHub – Multi-User Multi-User Notebook Web Service Notebook Web Service ● Centrally managed notebook service ● Web interface: Classic and/or JupyterLab ● Configurable proxy for dynamic URL routing ● Authenticators for user login ● Spawner to start single-user notebooks ● Kernel gateway for remote control of runtimes ● Centrally managed notebook service ● Web interface: Classic and/or JupyterLab ● Configurable proxy for dynamic URL routing ● Authenticators for user login ● Spawner to start single-user notebooks ● Kernel gateway for remote control of runtimes

Slide 16

Slide 16 text

High-Level Architecture High-Level Architecture

Slide 17

Slide 17 text

Xeus Kernel Framework Xeus Kernel Framework ● Native kernel development is not-so-easy ● Thus IPython wrapper as a bridge to runtimes ● Meet Xeus & Xeus-Cling ● C++ kernel development framework ● Initially made for native C++ kernel ● By now almost has feature-parity (compared to IPython) ● C++ easily integrates with any tech ● Native kernel development is not-so-easy ● Thus IPython wrapper as a bridge to runtimes ● Meet Xeus & Xeus-Cling ● C++ kernel development framework ● Initially made for native C++ kernel ● By now almost has feature-parity (compared to IPython) ● C++ easily integrates with any tech

Slide 18

Slide 18 text

repo2docker – Create Customized Kernels repo2docker – Create Customized Kernels ● Repeatability of shared notebooks ● Turn git repos into Jupyter Docker Images ● Build JupyterHub-ready images (custom kernels) ● Inspects repo contents for tech stacks ● environment.yml, requirements.txt, … https://github.com/jupyter/repo2docker ● Repeatability of shared notebooks ● Turn git repos into Jupyter Docker Images ● Build JupyterHub-ready images (custom kernels) ● Inspects repo contents for tech stacks ● environment.yml, requirements.txt, … https://github.com/jupyter/repo2docker

Slide 19

Slide 19 text

Binder – Advanced Technology, Almost Like Magic Binder – Advanced Technology, Almost Like Magic Combine Docker & JupyterHub, for… ● Repeatable notebook execution from git ● Ephemeral custom Jupyter runtimes ● Builds & spawns ad-hoc kernels (repo2docker) ● Free service at https://mybinder.org/ Combine Docker & JupyterHub, for… ● Repeatable notebook execution from git ● Ephemeral custom Jupyter runtimes ● Builds & spawns ad-hoc kernels (repo2docker) ● Free service at https://mybinder.org/

Slide 20

Slide 20 text

Deploying Deploying a Local a Local Data Science Data Science Platform Platform NASA's Juno Mission: Infrared Tour of Jupiter's North Pole NASA's Juno Mission: Infrared Tour of Jupiter's North Pole

Slide 21

Slide 21 text

JupyterHub Deployment Options JupyterHub Deployment Options ● Debian Package (1and1/debianized-jupyterhub) ● The Littlest JupyterHub (TLJH · official) ● Zero to JupyterHub with Kubernetes (official) ● And lots of 3rd party projects ● Debian Package (1and1/debianized-jupyterhub) ● The Littlest JupyterHub (TLJH · official) ● Zero to JupyterHub with Kubernetes (official) ● And lots of 3rd party projects

Slide 22

Slide 22 text

Showcase: DevOps Intelligence Showcase: DevOps Intelligence ● Optimize dev + ops processes ● Generate actionable insight ● Support risk analysis & decisions ● Typical use-cases: – Migration processes of all kinds (current state, progress tracking, achievement of objectives) – Inventory reporting for increased transparency – Automate internal reporting, liberating scarce human expertise ● Optimize dev + ops processes ● Generate actionable insight ● Support risk analysis & decisions ● Typical use-cases: – Migration processes of all kinds (current state, progress tracking, achievement of objectives) – Inventory reporting for increased transparency – Automate internal reporting, liberating scarce human expertise https://blog.jupyter.org/devops-intelligence-3ff48a76b525 https://blog.jupyter.org/devops-intelligence-3ff48a76b525

Slide 23

Slide 23 text

DevOps Intelligence Platform: DevOps Intelligence Platform: Simple Single-Host JupyterHub Deployment Simple Single-Host JupyterHub Deployment

Slide 24

Slide 24 text

JupyterHub Debian Package JupyterHub Debian Package ● JupyterHub‘s “Installation Guide” as software ● Turn-key setup (just add Python, NodeJS, and Chromium headless) ● Debian packaging of all core components and standard dependencies ● JupyterHub, notebook server, configurable HTTP proxy (CHP) ● JupyterLab, PySpark, NumPy, SciPy, Pandas, Matplotlib, Seaborn, HoloViews, … ● Best used on Debian Stretch / Ubuntu Bionic ● Configured for PAM authorization and sudo spawner by default ● Systemd process control / NginX SSL off-loader https://github.com/1and1/debianized-jupyterhub ● JupyterHub‘s “Installation Guide” as software ● Turn-key setup (just add Python, NodeJS, and Chromium headless) ● Debian packaging of all core components and standard dependencies ● JupyterHub, notebook server, configurable HTTP proxy (CHP) ● JupyterLab, PySpark, NumPy, SciPy, Pandas, Matplotlib, Seaborn, HoloViews, … ● Best used on Debian Stretch / Ubuntu Bionic ● Configured for PAM authorization and sudo spawner by default ● Systemd process control / NginX SSL off-loader https://github.com/1and1/debianized-jupyterhub

Slide 25

Slide 25 text

Just 617 MB ☺ Just 617 MB ☺

Slide 26

Slide 26 text

K8s Setup from 10 Miles High K8s Setup from 10 Miles High

Slide 27

Slide 27 text

Bloomberg‘s Architecture (Kerberos) Bloomberg‘s Architecture (Kerberos)

Slide 28

Slide 28 text

Bloomberg‘s Architecture (Docker) Bloomberg‘s Architecture (Docker)

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

The “nbconvert” Tool The “nbconvert” Tool ● CLI tool: jupyter nbconvert … ● https://github.com/jupyter/nbconvert ● Convert into… ● HTML · PDF · Markdown · ReST · Script · and more ● Basis for other tools & most simple workflow: ● Convert notebook to HTML page ● Upload HTML file to webserver (e.g. Artifactory) ● CLI tool: jupyter nbconvert … ● https://github.com/jupyter/nbconvert ● Convert into… ● HTML · PDF · Markdown · ReST · Script · and more ● Basis for other tools & most simple workflow: ● Convert notebook to HTML page ● Upload HTML file to webserver (e.g. Artifactory)

Slide 31

Slide 31 text

nbreport – nbreport – Clean Clean Notebook HTML Rendering Notebook HTML Rendering ● CLI tool or notebook extension ● Download cleaned-up single HTML page ● Remove technical ornaments ● Empty or explicitly hidden cells, hidden code ● Input / output counters, and stderr ● Add header information (author & title) ● CLI tool or notebook extension ● Download cleaned-up single HTML page ● Remove technical ornaments ● Empty or explicitly hidden cells, hidden code ● Input / output counters, and stderr ● Add header information (author & title)

Slide 32

Slide 32 text

Pandoc – Swiss Army Knife for Publishing Pandoc – Swiss Army Knife for Publishing ● Convert from / to many document formats ● Part of many rendering pipelines ● Needed to unlock all nbconvert features (together with TeX) ● Convert from / to many document formats ● Part of many rendering pipelines ● Needed to unlock all nbconvert features (together with TeX) https://pandoc.org/ https://pandoc.org/

Slide 33

Slide 33 text

Publishing Notebooks to Atlassian Confluence Publishing Notebooks to Atlassian Confluence https://github.com/Valassis-Digital-Media/nbconflux https://github.com/Valassis-Digital-Media/nbconflux

Slide 34

Slide 34 text

nbviewer – A simple way to share notebooks nbviewer – A simple way to share notebooks ● Render notebooks from git repositories ● Link back to repo, and to Binder (live notebook) ● Does not execute the notebook commit pre-rendered output cells ⇒ e.g. https://nbviewer.jupyter.org/github/jhermann/jupyter-by-example/tree/master/how-tos/ ● Render notebooks from git repositories ● Link back to repo, and to Binder (live notebook) ● Does not execute the notebook commit pre-rendered output cells ⇒ e.g. https://nbviewer.jupyter.org/github/jhermann/jupyter-by-example/tree/master/how-tos/

Slide 35

Slide 35 text

nbgallery – Enterprise Sharing / Collaboration Platform nbgallery – Enterprise Sharing / Collaboration Platform ● https://github.com/nbgallery/nbgallery ● Using Jupyter to Empower Enterprise Analysts https://youtu.be/9qS1U-ySwzE ● RoR web application ● MySQL / MariaDB ● Apache Solr indexing ● https://github.com/nbgallery/nbgallery ● Using Jupyter to Empower Enterprise Analysts https://youtu.be/9qS1U-ySwzE ● RoR web application ● MySQL / MariaDB ● Apache Solr indexing

Slide 36

Slide 36 text

Automation: Automation: Papermill by Papermill by nteract nteract Parameterize, execute & analyze notebooks Parameterize, execute & analyze notebooks https://medium.com/netflix-techblog/notebook-innovation-591ee3221233 https://medium.com/netflix-techblog/notebook-innovation-591ee3221233

Slide 37

Slide 37 text

Automation: Automation: Paperboy by Tim Paine Paperboy by Tim Paine Web UI for scheduling notebook reports ● Runtime: Papermill ● Scheduling: Airflow ● Persistence: SQLAlchemy Web UI for scheduling notebook reports ● Runtime: Papermill ● Scheduling: Airflow ● Persistence: SQLAlchemy

Slide 38

Slide 38 text

Even More Tools… Even More Tools… ● RISE – ‘Live’ reveal.js Jupyter / IPython Slideshow Extension https://github.com/damianavila/RISE ● QuantStack ‘Voila’ – Interactive renderer for Jupyter notebooks https://github.com/QuantStack/voila ● Anaconda‘s “Exploring Data using Python Visualization” https://anaconda.org/jbednar/exploring_data/notebook ● PyViz – Make data visualization easier to use & learn, and more powerful https://pyviz.org/ ● Knitty – Pandoc filter and Atom-friendly reports via Jupyter https://github.com/kiwi0fruit/knitty ● nbdime · nbstripout · jupytext · … ● RISE – ‘Live’ reveal.js Jupyter / IPython Slideshow Extension https://github.com/damianavila/RISE ● QuantStack ‘Voila’ – Interactive renderer for Jupyter notebooks https://github.com/QuantStack/voila ● Anaconda‘s “Exploring Data using Python Visualization” https://anaconda.org/jbednar/exploring_data/notebook ● PyViz – Make data visualization easier to use & learn, and more powerful https://pyviz.org/ ● Knitty – Pandoc filter and Atom-friendly reports via Jupyter https://github.com/kiwi0fruit/knitty ● nbdime · nbstripout · jupytext · …

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

References References ● Project Jupyter Homepage https://jupyter.org/ ● JupyterHub Homepage https://jupyterhub.readthedocs.io/en/stable/ ● Jupyter Community Channels https://jupyter.rtfd.io/en/latest/community/content-community.html#jupyter-communications ● Jupyter learning resources and practical tips https://github.com/jhermann/jupyter-by-example ● Project Jupyter Homepage https://jupyter.org/ ● JupyterHub Homepage https://jupyterhub.readthedocs.io/en/stable/ ● Jupyter Community Channels https://jupyter.rtfd.io/en/latest/community/content-community.html#jupyter-communications ● Jupyter learning resources and practical tips https://github.com/jhermann/jupyter-by-example

Slide 41

Slide 41 text

Acknowledgements Acknowledgements ● https://jupyter.readthedocs.io/ ● https://dailyhealthpoints.com/2016/11/23/the-importance-of-setting-a-training-goal/ ● https://commons.wikimedia.org/wiki/File:Porsche_911_(997)_GT3_RS_3.6_-_capot_arri%C3%A8re_ouvert_2.jpg ● https://commons.wikimedia.org/wiki/File:Porsche_911_(997)_GT3_RS_3.6_-_d%C3%A9tail_capot_avant.jpg ● https://www.jpl.nasa.gov/news/news.php?feature=7096 ● https://www.slideshare.net/SparkSummit/secured-kerberosbased-spark-notebook-for-data-science-spark-summit-east-talk-by-j oy-chakraborty ● http://www.picpedia.org/highway-signs/p/publish.html ● https://commons.wikimedia.org/wiki/File:Emojione_1F44D.svg ● https://jupyter.readthedocs.io/ ● https://dailyhealthpoints.com/2016/11/23/the-importance-of-setting-a-training-goal/ ● https://commons.wikimedia.org/wiki/File:Porsche_911_(997)_GT3_RS_3.6_-_capot_arri%C3%A8re_ouvert_2.jpg ● https://commons.wikimedia.org/wiki/File:Porsche_911_(997)_GT3_RS_3.6_-_d%C3%A9tail_capot_avant.jpg ● https://www.jpl.nasa.gov/news/news.php?feature=7096 ● https://www.slideshare.net/SparkSummit/secured-kerberosbased-spark-notebook-for-data-science-spark-summit-east-talk-by-j oy-chakraborty ● http://www.picpedia.org/highway-signs/p/publish.html ● https://commons.wikimedia.org/wiki/File:Emojione_1F44D.svg

Slide 42

Slide 42 text

Questions? Questions? Thank you! Thank you!