Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JupyterHub and Jupyter Notebook – A View Under the Hood

JupyterHub and Jupyter Notebook – A View Under the Hood

This talk tries to give you an overview of all the little parts that together make up a Jupyter-powered application, and how they fit together. Hint: For clickable links, download the PDF file and view locally.

* The Jupyter Project and its building blocks
* Internal magic: protocols, authenticators, spawners, gateways, kernels, …
* Deploying a local data science platform

- Simple host-based deployments
- Running Jupyter on Kubernetes

* Options for publishing notebooks

⏰ Duration: 30-45min + Q&A

PyData Südwest Meetup · April 2019 in Karlsruhe, Germany
https://www.meetup.com/PyData-Suedwest/events/258321928/

Jürgen Hermann

April 24, 2019
Tweet

More Decks by Jürgen Hermann

Other Decks in Technology

Transcript

  1. Architecture Board
    Architecture Board
    Application Monitoring Service
    Application Monitoring Service
    Phase 1 – Metrics Gateway
    Phase 1 – Metrics Gateway
    Jürgen Hermann
    Karlsruhe · 2018-04-24
    JupyterHub and Jupyter Notebook
    JupyterHub and Jupyter Notebook
    A View Under the Hood
    A View Under the Hood
    jhermann
    jhermann_
    [email protected]

    View Slide

  2. Agenda
    Agenda

    The Jupyter Project and its building blocks
    The Jupyter Project and its building blocks

    Internal magic: protocols, spawners, gateways, …
    Internal magic: protocols, spawners, gateways, …

    Deploying a local data science platform
    Deploying a local data science platform

    Simple host-based deployments
    Simple host-based deployments

    Running Jupyter on Kubernetes
    Running Jupyter on Kubernetes

    Options for publishing notebooks
    Options for publishing notebooks

    The Jupyter Project and its building blocks
    The Jupyter Project and its building blocks

    Internal magic: protocols, spawners, gateways, …
    Internal magic: protocols, spawners, gateways, …

    Deploying a local data science platform
    Deploying a local data science platform

    Simple host-based deployments
    Simple host-based deployments

    Running Jupyter on Kubernetes
    Running Jupyter on Kubernetes

    Options for publishing notebooks
    Options for publishing notebooks
    This talk tries to give you an overview of
    This talk tries to give you an overview of
    all the little parts that together make up
    all the little parts that together make up
    a Jupyter-powered application, and how
    a Jupyter-powered application, and how
    they fit together.
    they fit together.
    This talk tries to give you an overview of
    This talk tries to give you an overview of
    all the little parts that together make up
    all the little parts that together make up
    a Jupyter-powered application, and how
    a Jupyter-powered application, and how
    they fit together.
    they fit together.

    View Slide

  3. It‘s Simple &
    It‘s Simple &
    Shiny
    Shiny

    View Slide

  4. The ‘Classic’ Notebook User Interface
    The ‘Classic’ Notebook User Interface
    Source:
    Source: https://ipython.org/
    https://ipython.org/

    View Slide

  5. The Jupyter Promise
    The Jupyter Promise

    “Data Science IDE” in your browser

    File system view (notebook dashboard)

    Notebook editor with markdown + code cells

    Runtimes for Julia, Python, R, and lots more

    Wide selection of (interactive) visualizations

    Customization with widgets and ‘magics’

    “Data Science IDE” in your browser

    File system view (notebook dashboard)

    Notebook editor with markdown + code cells

    Runtimes for Julia, Python, R, and lots more

    Wide selection of (interactive) visualizations

    Customization with widgets and ‘magics’

    View Slide

  6. The Python
    The Python
    Advantage
    Advantage
    Scientific /
    Scientific /
    Data Science
    Data Science
    Stack
    Stack
    https://www.slideshare.net/icaromedeiros/why-python-is-better-for-data-science
    https://www.slideshare.net/icaromedeiros/why-python-is-better-for-data-science

    View Slide

  7. PySpark (Local)
    PySpark (Local)

    View Slide

  8. Python‘s Visualization
    Python‘s Visualization
    Landscape
    Landscape
    https://youtu.be/FytuB8nFHPQ
    https://youtu.be/FytuB8nFHPQ

    View Slide

  9. Seaborn
    Seaborn

    View Slide

  10. Bokeh
    Bokeh

    View Slide

  11. Altair / Vega
    Altair / Vega

    View Slide

  12. It‘s
    It‘s
    Complicated
    Complicated

    View Slide

  13. The Jupyter Universe
    The Jupyter Universe

    Servers (Web UI)

    Interfaces / Clients

    Jupyter API

    IPython Reference Kernel

    Servers (Web UI)

    Interfaces / Clients

    Jupyter API

    IPython Reference Kernel
    https://jupyter.readthedocs.io/en/latest/architecture/visual_overview.html
    https://jupyter.readthedocs.io/en/latest/architecture/visual_overview.html

    View Slide

  14. Single-User Notebook Server
    Single-User Notebook Server

    Web UI for a single notebook file

    Talks to a kernel via ØMQ
    (for terminal I/O)

    Web UI for a single notebook file

    Talks to a kernel via ØMQ
    (for terminal I/O)

    View Slide

  15. JupyterHub –
    JupyterHub –
    Multi-User
    Multi-User Notebook Web Service
    Notebook Web Service

    Centrally managed notebook service

    Web interface: Classic and/or JupyterLab

    Configurable proxy for dynamic URL routing

    Authenticators for user login

    Spawner to start single-user notebooks

    Kernel gateway for remote control of runtimes

    Centrally managed notebook service

    Web interface: Classic and/or JupyterLab

    Configurable proxy for dynamic URL routing

    Authenticators for user login

    Spawner to start single-user notebooks

    Kernel gateway for remote control of runtimes

    View Slide

  16. High-Level Architecture
    High-Level Architecture

    View Slide

  17. Xeus Kernel Framework
    Xeus Kernel Framework

    Native kernel development is not-so-easy

    Thus IPython wrapper as a bridge to runtimes

    Meet Xeus & Xeus-Cling

    C++ kernel development framework

    Initially made for native C++ kernel

    By now almost has feature-parity
    (compared to IPython)

    C++ easily integrates with any tech

    Native kernel development is not-so-easy

    Thus IPython wrapper as a bridge to runtimes

    Meet Xeus & Xeus-Cling

    C++ kernel development framework

    Initially made for native C++ kernel

    By now almost has feature-parity
    (compared to IPython)

    C++ easily integrates with any tech

    View Slide

  18. repo2docker – Create Customized Kernels
    repo2docker – Create Customized Kernels

    Repeatability of shared notebooks

    Turn git repos into Jupyter Docker Images

    Build JupyterHub-ready images (custom kernels)

    Inspects repo contents for tech stacks

    environment.yml, requirements.txt, …
    https://github.com/jupyter/repo2docker

    Repeatability of shared notebooks

    Turn git repos into Jupyter Docker Images

    Build JupyterHub-ready images (custom kernels)

    Inspects repo contents for tech stacks

    environment.yml, requirements.txt, …
    https://github.com/jupyter/repo2docker

    View Slide

  19. Binder – Advanced Technology, Almost Like Magic
    Binder – Advanced Technology, Almost Like Magic
    Combine Docker & JupyterHub, for…

    Repeatable notebook execution from git

    Ephemeral custom Jupyter runtimes

    Builds & spawns ad-hoc kernels (repo2docker)

    Free service at https://mybinder.org/
    Combine Docker & JupyterHub, for…

    Repeatable notebook execution from git

    Ephemeral custom Jupyter runtimes

    Builds & spawns ad-hoc kernels (repo2docker)

    Free service at https://mybinder.org/

    View Slide

  20. Deploying
    Deploying
    a Local
    a Local
    Data Science
    Data Science
    Platform
    Platform
    NASA's Juno Mission: Infrared Tour of Jupiter's North Pole
    NASA's Juno Mission: Infrared Tour of Jupiter's North Pole

    View Slide

  21. JupyterHub Deployment Options
    JupyterHub Deployment Options

    Debian Package (1and1/debianized-jupyterhub)

    The Littlest JupyterHub (TLJH · official)

    Zero to JupyterHub with Kubernetes (official)

    And lots of 3rd party projects

    Debian Package (1and1/debianized-jupyterhub)

    The Littlest JupyterHub (TLJH · official)

    Zero to JupyterHub with Kubernetes (official)

    And lots of 3rd party projects

    View Slide

  22. Showcase: DevOps Intelligence
    Showcase: DevOps Intelligence

    Optimize dev + ops processes

    Generate actionable insight

    Support risk analysis & decisions

    Typical use-cases:
    – Migration processes of all kinds
    (current state, progress tracking, achievement of objectives)
    – Inventory reporting for increased transparency
    – Automate internal reporting, liberating scarce human expertise

    Optimize dev + ops processes

    Generate actionable insight

    Support risk analysis & decisions

    Typical use-cases:
    – Migration processes of all kinds
    (current state, progress tracking, achievement of objectives)
    – Inventory reporting for increased transparency
    – Automate internal reporting, liberating scarce human expertise
    https://blog.jupyter.org/devops-intelligence-3ff48a76b525
    https://blog.jupyter.org/devops-intelligence-3ff48a76b525

    View Slide

  23. DevOps Intelligence Platform:
    DevOps Intelligence Platform:
    Simple Single-Host JupyterHub Deployment
    Simple Single-Host JupyterHub Deployment

    View Slide

  24. JupyterHub Debian Package
    JupyterHub Debian Package

    JupyterHub‘s “Installation Guide” as software

    Turn-key setup (just add Python, NodeJS, and Chromium headless)

    Debian packaging of all core components and standard dependencies

    JupyterHub, notebook server, configurable HTTP proxy (CHP)

    JupyterLab, PySpark, NumPy, SciPy, Pandas, Matplotlib, Seaborn, HoloViews, …

    Best used on Debian Stretch / Ubuntu Bionic

    Configured for PAM authorization and sudo spawner by default

    Systemd process control / NginX SSL off-loader
    https://github.com/1and1/debianized-jupyterhub

    JupyterHub‘s “Installation Guide” as software

    Turn-key setup (just add Python, NodeJS, and Chromium headless)

    Debian packaging of all core components and standard dependencies

    JupyterHub, notebook server, configurable HTTP proxy (CHP)

    JupyterLab, PySpark, NumPy, SciPy, Pandas, Matplotlib, Seaborn, HoloViews, …

    Best used on Debian Stretch / Ubuntu Bionic

    Configured for PAM authorization and sudo spawner by default

    Systemd process control / NginX SSL off-loader
    https://github.com/1and1/debianized-jupyterhub

    View Slide

  25. Just 617 MB ☺
    Just 617 MB ☺

    View Slide

  26. K8s Setup from 10 Miles High
    K8s Setup from 10 Miles High

    View Slide

  27. Bloomberg‘s Architecture (Kerberos)
    Bloomberg‘s Architecture (Kerberos)

    View Slide

  28. Bloomberg‘s Architecture (Docker)
    Bloomberg‘s Architecture (Docker)

    View Slide

  29. View Slide

  30. The “nbconvert” Tool
    The “nbconvert” Tool

    CLI tool: jupyter nbconvert …

    https://github.com/jupyter/nbconvert

    Convert into…

    HTML · PDF · Markdown · ReST · Script · and more

    Basis for other tools & most simple workflow:

    Convert notebook to HTML page

    Upload HTML file to webserver (e.g. Artifactory)

    CLI tool: jupyter nbconvert …

    https://github.com/jupyter/nbconvert

    Convert into…

    HTML · PDF · Markdown · ReST · Script · and more

    Basis for other tools & most simple workflow:

    Convert notebook to HTML page

    Upload HTML file to webserver (e.g. Artifactory)

    View Slide

  31. nbreport –
    nbreport – Clean
    Clean Notebook HTML Rendering
    Notebook HTML Rendering

    CLI tool or notebook extension

    Download cleaned-up single HTML page

    Remove technical ornaments

    Empty or explicitly hidden cells, hidden code

    Input / output counters, and stderr

    Add header information (author & title)

    CLI tool or notebook extension

    Download cleaned-up single HTML page

    Remove technical ornaments

    Empty or explicitly hidden cells, hidden code

    Input / output counters, and stderr

    Add header information (author & title)

    View Slide

  32. Pandoc – Swiss Army Knife for Publishing
    Pandoc – Swiss Army Knife for Publishing

    Convert from / to many document formats

    Part of many rendering pipelines

    Needed to unlock all nbconvert features
    (together with TeX)

    Convert from / to many document formats

    Part of many rendering pipelines

    Needed to unlock all nbconvert features
    (together with TeX)
    https://pandoc.org/
    https://pandoc.org/

    View Slide

  33. Publishing Notebooks to Atlassian Confluence
    Publishing Notebooks to Atlassian Confluence
    https://github.com/Valassis-Digital-Media/nbconflux
    https://github.com/Valassis-Digital-Media/nbconflux

    View Slide

  34. nbviewer – A simple way to share notebooks
    nbviewer – A simple way to share notebooks

    Render notebooks from git repositories

    Link back to repo, and to Binder (live notebook)

    Does not execute the notebook
    commit pre-rendered output cells

    e.g. https://nbviewer.jupyter.org/github/jhermann/jupyter-by-example/tree/master/how-tos/

    Render notebooks from git repositories

    Link back to repo, and to Binder (live notebook)

    Does not execute the notebook
    commit pre-rendered output cells

    e.g. https://nbviewer.jupyter.org/github/jhermann/jupyter-by-example/tree/master/how-tos/

    View Slide

  35. nbgallery – Enterprise Sharing / Collaboration Platform
    nbgallery – Enterprise Sharing / Collaboration Platform

    https://github.com/nbgallery/nbgallery

    Using Jupyter to Empower
    Enterprise Analysts
    https://youtu.be/9qS1U-ySwzE

    RoR web application

    MySQL / MariaDB

    Apache Solr indexing

    https://github.com/nbgallery/nbgallery

    Using Jupyter to Empower
    Enterprise Analysts
    https://youtu.be/9qS1U-ySwzE

    RoR web application

    MySQL / MariaDB

    Apache Solr indexing

    View Slide

  36. Automation:
    Automation: Papermill by
    Papermill by nteract
    nteract
    Parameterize, execute
    & analyze notebooks
    Parameterize, execute
    & analyze notebooks
    https://medium.com/netflix-techblog/notebook-innovation-591ee3221233
    https://medium.com/netflix-techblog/notebook-innovation-591ee3221233

    View Slide

  37. Automation:
    Automation:
    Paperboy by Tim Paine
    Paperboy by Tim Paine
    Web UI for scheduling
    notebook reports

    Runtime: Papermill

    Scheduling: Airflow

    Persistence: SQLAlchemy
    Web UI for scheduling
    notebook reports

    Runtime: Papermill

    Scheduling: Airflow

    Persistence: SQLAlchemy

    View Slide

  38. Even More Tools…
    Even More Tools…

    RISE – ‘Live’ reveal.js Jupyter / IPython Slideshow Extension
    https://github.com/damianavila/RISE

    QuantStack ‘Voila’ – Interactive renderer for Jupyter notebooks
    https://github.com/QuantStack/voila

    Anaconda‘s “Exploring Data using Python Visualization”
    https://anaconda.org/jbednar/exploring_data/notebook

    PyViz – Make data visualization easier to use & learn, and more powerful
    https://pyviz.org/

    Knitty – Pandoc filter and Atom-friendly reports via Jupyter
    https://github.com/kiwi0fruit/knitty

    nbdime · nbstripout · jupytext · …

    RISE – ‘Live’ reveal.js Jupyter / IPython Slideshow Extension
    https://github.com/damianavila/RISE

    QuantStack ‘Voila’ – Interactive renderer for Jupyter notebooks
    https://github.com/QuantStack/voila

    Anaconda‘s “Exploring Data using Python Visualization”
    https://anaconda.org/jbednar/exploring_data/notebook

    PyViz – Make data visualization easier to use & learn, and more powerful
    https://pyviz.org/

    Knitty – Pandoc filter and Atom-friendly reports via Jupyter
    https://github.com/kiwi0fruit/knitty

    nbdime · nbstripout · jupytext · …

    View Slide

  39. View Slide

  40. References
    References

    Project Jupyter Homepage
    https://jupyter.org/

    JupyterHub Homepage
    https://jupyterhub.readthedocs.io/en/stable/

    Jupyter Community Channels
    https://jupyter.rtfd.io/en/latest/community/content-community.html#jupyter-communications

    Jupyter learning resources and practical tips
    https://github.com/jhermann/jupyter-by-example

    Project Jupyter Homepage
    https://jupyter.org/

    JupyterHub Homepage
    https://jupyterhub.readthedocs.io/en/stable/

    Jupyter Community Channels
    https://jupyter.rtfd.io/en/latest/community/content-community.html#jupyter-communications

    Jupyter learning resources and practical tips
    https://github.com/jhermann/jupyter-by-example

    View Slide

  41. Acknowledgements
    Acknowledgements

    https://jupyter.readthedocs.io/

    https://dailyhealthpoints.com/2016/11/23/the-importance-of-setting-a-training-goal/

    https://commons.wikimedia.org/wiki/File:Porsche_911_(997)_GT3_RS_3.6_-_capot_arri%C3%A8re_ouvert_2.jpg

    https://commons.wikimedia.org/wiki/File:Porsche_911_(997)_GT3_RS_3.6_-_d%C3%A9tail_capot_avant.jpg

    https://www.jpl.nasa.gov/news/news.php?feature=7096

    https://www.slideshare.net/SparkSummit/secured-kerberosbased-spark-notebook-for-data-science-spark-summit-east-talk-by-j
    oy-chakraborty

    http://www.picpedia.org/highway-signs/p/publish.html

    https://commons.wikimedia.org/wiki/File:Emojione_1F44D.svg

    https://jupyter.readthedocs.io/

    https://dailyhealthpoints.com/2016/11/23/the-importance-of-setting-a-training-goal/

    https://commons.wikimedia.org/wiki/File:Porsche_911_(997)_GT3_RS_3.6_-_capot_arri%C3%A8re_ouvert_2.jpg

    https://commons.wikimedia.org/wiki/File:Porsche_911_(997)_GT3_RS_3.6_-_d%C3%A9tail_capot_avant.jpg

    https://www.jpl.nasa.gov/news/news.php?feature=7096

    https://www.slideshare.net/SparkSummit/secured-kerberosbased-spark-notebook-for-data-science-spark-summit-east-talk-by-j
    oy-chakraborty

    http://www.picpedia.org/highway-signs/p/publish.html

    https://commons.wikimedia.org/wiki/File:Emojione_1F44D.svg

    View Slide

  42. Questions?
    Questions?
    Thank you!
    Thank you!

    View Slide