Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JupyterHub and Jupyter Notebook – A View Under the Hood

JupyterHub and Jupyter Notebook – A View Under the Hood

This talk tries to give you an overview of all the little parts that together make up a Jupyter-powered application, and how they fit together. Hint: For clickable links, download the PDF file and view locally.

* The Jupyter Project and its building blocks
* Internal magic: protocols, authenticators, spawners, gateways, kernels, …
* Deploying a local data science platform

- Simple host-based deployments
- Running Jupyter on Kubernetes

* Options for publishing notebooks

⏰ Duration: 30-45min + Q&A

PyData Südwest Meetup · April 2019 in Karlsruhe, Germany
https://www.meetup.com/PyData-Suedwest/events/258321928/

36abbd237208c50abbaef2aca69fc149?s=128

Jürgen Hermann

April 24, 2019
Tweet

More Decks by Jürgen Hermann

Other Decks in Technology

Transcript

  1. Architecture Board Architecture Board Application Monitoring Service Application Monitoring Service

    Phase 1 – Metrics Gateway Phase 1 – Metrics Gateway Jürgen Hermann Karlsruhe · 2018-04-24 JupyterHub and Jupyter Notebook JupyterHub and Jupyter Notebook A View Under the Hood A View Under the Hood jhermann jhermann_ jh@web.de
  2. Agenda Agenda • The Jupyter Project and its building blocks

    The Jupyter Project and its building blocks • Internal magic: protocols, spawners, gateways, … Internal magic: protocols, spawners, gateways, … • Deploying a local data science platform Deploying a local data science platform • Simple host-based deployments Simple host-based deployments • Running Jupyter on Kubernetes Running Jupyter on Kubernetes • Options for publishing notebooks Options for publishing notebooks • The Jupyter Project and its building blocks The Jupyter Project and its building blocks • Internal magic: protocols, spawners, gateways, … Internal magic: protocols, spawners, gateways, … • Deploying a local data science platform Deploying a local data science platform • Simple host-based deployments Simple host-based deployments • Running Jupyter on Kubernetes Running Jupyter on Kubernetes • Options for publishing notebooks Options for publishing notebooks This talk tries to give you an overview of This talk tries to give you an overview of all the little parts that together make up all the little parts that together make up a Jupyter-powered application, and how a Jupyter-powered application, and how they fit together. they fit together. This talk tries to give you an overview of This talk tries to give you an overview of all the little parts that together make up all the little parts that together make up a Jupyter-powered application, and how a Jupyter-powered application, and how they fit together. they fit together.
  3. It‘s Simple & It‘s Simple & Shiny Shiny

  4. The ‘Classic’ Notebook User Interface The ‘Classic’ Notebook User Interface

    Source: Source: https://ipython.org/ https://ipython.org/
  5. The Jupyter Promise The Jupyter Promise • “Data Science IDE”

    in your browser • File system view (notebook dashboard) • Notebook editor with markdown + code cells • Runtimes for Julia, Python, R, and lots more • Wide selection of (interactive) visualizations • Customization with widgets and ‘magics’ • “Data Science IDE” in your browser • File system view (notebook dashboard) • Notebook editor with markdown + code cells • Runtimes for Julia, Python, R, and lots more • Wide selection of (interactive) visualizations • Customization with widgets and ‘magics’
  6. The Python The Python Advantage Advantage Scientific / Scientific /

    Data Science Data Science Stack Stack https://www.slideshare.net/icaromedeiros/why-python-is-better-for-data-science https://www.slideshare.net/icaromedeiros/why-python-is-better-for-data-science
  7. PySpark (Local) PySpark (Local)

  8. Python‘s Visualization Python‘s Visualization Landscape Landscape https://youtu.be/FytuB8nFHPQ https://youtu.be/FytuB8nFHPQ

  9. Seaborn Seaborn

  10. Bokeh Bokeh

  11. Altair / Vega Altair / Vega

  12. It‘s It‘s Complicated Complicated

  13. The Jupyter Universe The Jupyter Universe • Servers (Web UI)

    • Interfaces / Clients • Jupyter API • IPython Reference Kernel • Servers (Web UI) • Interfaces / Clients • Jupyter API • IPython Reference Kernel https://jupyter.readthedocs.io/en/latest/architecture/visual_overview.html https://jupyter.readthedocs.io/en/latest/architecture/visual_overview.html
  14. Single-User Notebook Server Single-User Notebook Server • Web UI for

    a single notebook file • Talks to a kernel via ØMQ (for terminal I/O) • Web UI for a single notebook file • Talks to a kernel via ØMQ (for terminal I/O)
  15. JupyterHub – JupyterHub – Multi-User Multi-User Notebook Web Service Notebook

    Web Service • Centrally managed notebook service • Web interface: Classic and/or JupyterLab • Configurable proxy for dynamic URL routing • Authenticators for user login • Spawner to start single-user notebooks • Kernel gateway for remote control of runtimes • Centrally managed notebook service • Web interface: Classic and/or JupyterLab • Configurable proxy for dynamic URL routing • Authenticators for user login • Spawner to start single-user notebooks • Kernel gateway for remote control of runtimes
  16. High-Level Architecture High-Level Architecture

  17. Xeus Kernel Framework Xeus Kernel Framework • Native kernel development

    is not-so-easy • Thus IPython wrapper as a bridge to runtimes • Meet Xeus & Xeus-Cling • C++ kernel development framework • Initially made for native C++ kernel • By now almost has feature-parity (compared to IPython) • C++ easily integrates with any tech • Native kernel development is not-so-easy • Thus IPython wrapper as a bridge to runtimes • Meet Xeus & Xeus-Cling • C++ kernel development framework • Initially made for native C++ kernel • By now almost has feature-parity (compared to IPython) • C++ easily integrates with any tech
  18. repo2docker – Create Customized Kernels repo2docker – Create Customized Kernels

    • Repeatability of shared notebooks • Turn git repos into Jupyter Docker Images • Build JupyterHub-ready images (custom kernels) • Inspects repo contents for tech stacks • environment.yml, requirements.txt, … https://github.com/jupyter/repo2docker • Repeatability of shared notebooks • Turn git repos into Jupyter Docker Images • Build JupyterHub-ready images (custom kernels) • Inspects repo contents for tech stacks • environment.yml, requirements.txt, … https://github.com/jupyter/repo2docker
  19. Binder – Advanced Technology, Almost Like Magic Binder – Advanced

    Technology, Almost Like Magic Combine Docker & JupyterHub, for… • Repeatable notebook execution from git • Ephemeral custom Jupyter runtimes • Builds & spawns ad-hoc kernels (repo2docker) • Free service at https://mybinder.org/ Combine Docker & JupyterHub, for… • Repeatable notebook execution from git • Ephemeral custom Jupyter runtimes • Builds & spawns ad-hoc kernels (repo2docker) • Free service at https://mybinder.org/
  20. Deploying Deploying a Local a Local Data Science Data Science

    Platform Platform NASA's Juno Mission: Infrared Tour of Jupiter's North Pole NASA's Juno Mission: Infrared Tour of Jupiter's North Pole
  21. JupyterHub Deployment Options JupyterHub Deployment Options • Debian Package (1and1/debianized-jupyterhub)

    • The Littlest JupyterHub (TLJH · official) • Zero to JupyterHub with Kubernetes (official) • And lots of 3rd party projects • Debian Package (1and1/debianized-jupyterhub) • The Littlest JupyterHub (TLJH · official) • Zero to JupyterHub with Kubernetes (official) • And lots of 3rd party projects
  22. Showcase: DevOps Intelligence Showcase: DevOps Intelligence • Optimize dev +

    ops processes • Generate actionable insight • Support risk analysis & decisions • Typical use-cases: – Migration processes of all kinds (current state, progress tracking, achievement of objectives) – Inventory reporting for increased transparency – Automate internal reporting, liberating scarce human expertise • Optimize dev + ops processes • Generate actionable insight • Support risk analysis & decisions • Typical use-cases: – Migration processes of all kinds (current state, progress tracking, achievement of objectives) – Inventory reporting for increased transparency – Automate internal reporting, liberating scarce human expertise https://blog.jupyter.org/devops-intelligence-3ff48a76b525 https://blog.jupyter.org/devops-intelligence-3ff48a76b525
  23. DevOps Intelligence Platform: DevOps Intelligence Platform: Simple Single-Host JupyterHub Deployment

    Simple Single-Host JupyterHub Deployment
  24. JupyterHub Debian Package JupyterHub Debian Package • JupyterHub‘s “Installation Guide”

    as software • Turn-key setup (just add Python, NodeJS, and Chromium headless) • Debian packaging of all core components and standard dependencies • JupyterHub, notebook server, configurable HTTP proxy (CHP) • JupyterLab, PySpark, NumPy, SciPy, Pandas, Matplotlib, Seaborn, HoloViews, … • Best used on Debian Stretch / Ubuntu Bionic • Configured for PAM authorization and sudo spawner by default • Systemd process control / NginX SSL off-loader https://github.com/1and1/debianized-jupyterhub • JupyterHub‘s “Installation Guide” as software • Turn-key setup (just add Python, NodeJS, and Chromium headless) • Debian packaging of all core components and standard dependencies • JupyterHub, notebook server, configurable HTTP proxy (CHP) • JupyterLab, PySpark, NumPy, SciPy, Pandas, Matplotlib, Seaborn, HoloViews, … • Best used on Debian Stretch / Ubuntu Bionic • Configured for PAM authorization and sudo spawner by default • Systemd process control / NginX SSL off-loader https://github.com/1and1/debianized-jupyterhub
  25. Just 617 MB ☺ Just 617 MB ☺

  26. K8s Setup from 10 Miles High K8s Setup from 10

    Miles High
  27. Bloomberg‘s Architecture (Kerberos) Bloomberg‘s Architecture (Kerberos)

  28. Bloomberg‘s Architecture (Docker) Bloomberg‘s Architecture (Docker)

  29. None
  30. The “nbconvert” Tool The “nbconvert” Tool • CLI tool: jupyter

    nbconvert … • https://github.com/jupyter/nbconvert • Convert into… • HTML · PDF · Markdown · ReST · Script · and more • Basis for other tools & most simple workflow: • Convert notebook to HTML page • Upload HTML file to webserver (e.g. Artifactory) • CLI tool: jupyter nbconvert … • https://github.com/jupyter/nbconvert • Convert into… • HTML · PDF · Markdown · ReST · Script · and more • Basis for other tools & most simple workflow: • Convert notebook to HTML page • Upload HTML file to webserver (e.g. Artifactory)
  31. nbreport – nbreport – Clean Clean Notebook HTML Rendering Notebook

    HTML Rendering • CLI tool or notebook extension • Download cleaned-up single HTML page • Remove technical ornaments • Empty or explicitly hidden cells, hidden code • Input / output counters, and stderr • Add header information (author & title) • CLI tool or notebook extension • Download cleaned-up single HTML page • Remove technical ornaments • Empty or explicitly hidden cells, hidden code • Input / output counters, and stderr • Add header information (author & title)
  32. Pandoc – Swiss Army Knife for Publishing Pandoc – Swiss

    Army Knife for Publishing • Convert from / to many document formats • Part of many rendering pipelines • Needed to unlock all nbconvert features (together with TeX) • Convert from / to many document formats • Part of many rendering pipelines • Needed to unlock all nbconvert features (together with TeX) https://pandoc.org/ https://pandoc.org/
  33. Publishing Notebooks to Atlassian Confluence Publishing Notebooks to Atlassian Confluence

    https://github.com/Valassis-Digital-Media/nbconflux https://github.com/Valassis-Digital-Media/nbconflux
  34. nbviewer – A simple way to share notebooks nbviewer –

    A simple way to share notebooks • Render notebooks from git repositories • Link back to repo, and to Binder (live notebook) • Does not execute the notebook commit pre-rendered output cells ⇒ e.g. https://nbviewer.jupyter.org/github/jhermann/jupyter-by-example/tree/master/how-tos/ • Render notebooks from git repositories • Link back to repo, and to Binder (live notebook) • Does not execute the notebook commit pre-rendered output cells ⇒ e.g. https://nbviewer.jupyter.org/github/jhermann/jupyter-by-example/tree/master/how-tos/
  35. nbgallery – Enterprise Sharing / Collaboration Platform nbgallery – Enterprise

    Sharing / Collaboration Platform • https://github.com/nbgallery/nbgallery • Using Jupyter to Empower Enterprise Analysts https://youtu.be/9qS1U-ySwzE • RoR web application • MySQL / MariaDB • Apache Solr indexing • https://github.com/nbgallery/nbgallery • Using Jupyter to Empower Enterprise Analysts https://youtu.be/9qS1U-ySwzE • RoR web application • MySQL / MariaDB • Apache Solr indexing
  36. Automation: Automation: Papermill by Papermill by nteract nteract Parameterize, execute

    & analyze notebooks Parameterize, execute & analyze notebooks https://medium.com/netflix-techblog/notebook-innovation-591ee3221233 https://medium.com/netflix-techblog/notebook-innovation-591ee3221233
  37. Automation: Automation: Paperboy by Tim Paine Paperboy by Tim Paine

    Web UI for scheduling notebook reports • Runtime: Papermill • Scheduling: Airflow • Persistence: SQLAlchemy Web UI for scheduling notebook reports • Runtime: Papermill • Scheduling: Airflow • Persistence: SQLAlchemy
  38. Even More Tools… Even More Tools… • RISE – ‘Live’

    reveal.js Jupyter / IPython Slideshow Extension https://github.com/damianavila/RISE • QuantStack ‘Voila’ – Interactive renderer for Jupyter notebooks https://github.com/QuantStack/voila • Anaconda‘s “Exploring Data using Python Visualization” https://anaconda.org/jbednar/exploring_data/notebook • PyViz – Make data visualization easier to use & learn, and more powerful https://pyviz.org/ • Knitty – Pandoc filter and Atom-friendly reports via Jupyter https://github.com/kiwi0fruit/knitty • nbdime · nbstripout · jupytext · … • RISE – ‘Live’ reveal.js Jupyter / IPython Slideshow Extension https://github.com/damianavila/RISE • QuantStack ‘Voila’ – Interactive renderer for Jupyter notebooks https://github.com/QuantStack/voila • Anaconda‘s “Exploring Data using Python Visualization” https://anaconda.org/jbednar/exploring_data/notebook • PyViz – Make data visualization easier to use & learn, and more powerful https://pyviz.org/ • Knitty – Pandoc filter and Atom-friendly reports via Jupyter https://github.com/kiwi0fruit/knitty • nbdime · nbstripout · jupytext · …
  39. None
  40. References References • Project Jupyter Homepage https://jupyter.org/ • JupyterHub Homepage

    https://jupyterhub.readthedocs.io/en/stable/ • Jupyter Community Channels https://jupyter.rtfd.io/en/latest/community/content-community.html#jupyter-communications • Jupyter learning resources and practical tips https://github.com/jhermann/jupyter-by-example • Project Jupyter Homepage https://jupyter.org/ • JupyterHub Homepage https://jupyterhub.readthedocs.io/en/stable/ • Jupyter Community Channels https://jupyter.rtfd.io/en/latest/community/content-community.html#jupyter-communications • Jupyter learning resources and practical tips https://github.com/jhermann/jupyter-by-example
  41. Acknowledgements Acknowledgements • https://jupyter.readthedocs.io/ • https://dailyhealthpoints.com/2016/11/23/the-importance-of-setting-a-training-goal/ • https://commons.wikimedia.org/wiki/File:Porsche_911_(997)_GT3_RS_3.6_-_capot_arri%C3%A8re_ouvert_2.jpg • https://commons.wikimedia.org/wiki/File:Porsche_911_(997)_GT3_RS_3.6_-_d%C3%A9tail_capot_avant.jpg

    • https://www.jpl.nasa.gov/news/news.php?feature=7096 • https://www.slideshare.net/SparkSummit/secured-kerberosbased-spark-notebook-for-data-science-spark-summit-east-talk-by-j oy-chakraborty • http://www.picpedia.org/highway-signs/p/publish.html • https://commons.wikimedia.org/wiki/File:Emojione_1F44D.svg • https://jupyter.readthedocs.io/ • https://dailyhealthpoints.com/2016/11/23/the-importance-of-setting-a-training-goal/ • https://commons.wikimedia.org/wiki/File:Porsche_911_(997)_GT3_RS_3.6_-_capot_arri%C3%A8re_ouvert_2.jpg • https://commons.wikimedia.org/wiki/File:Porsche_911_(997)_GT3_RS_3.6_-_d%C3%A9tail_capot_avant.jpg • https://www.jpl.nasa.gov/news/news.php?feature=7096 • https://www.slideshare.net/SparkSummit/secured-kerberosbased-spark-notebook-for-data-science-spark-summit-east-talk-by-j oy-chakraborty • http://www.picpedia.org/highway-signs/p/publish.html • https://commons.wikimedia.org/wiki/File:Emojione_1F44D.svg
  42. Questions? Questions? Thank you! Thank you!