Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dask on Kubernetes

Dask on Kubernetes

Jacob Tomlinson

May 19, 2021
Tweet

More Decks by Jacob Tomlinson

Other Decks in Technology

Transcript

  1. Deployment Workshop
    Dask on Kubernetes
    Jacob Tomlinson

    View Slide

  2. Types of Dask cluster
    Fixed Ephemeral

    View Slide

  3. Fixed clusters
    More traditional cluster deployments where you
    set things up and leave them running indefinitely.
    They idle while not in use but are always ready to go
    when you need them.

    View Slide

  4. Ephemeral clusters
    Dynamic clusters which are created in the moment of need
    and destroyed again when you’re done.
    These rely on some underlying scheduling system which
    can quickly provision resources.

    View Slide

  5. Dask Helm Chart
    A chart which launches a fixed size Dask cluster alongside a
    Jupyter notebook.
    Fixed

    View Slide

  6. View Slide

  7. Service
    Worker Worker Worker
    Scheduler
    Jupyter
    Dask Helm Chart
    Deployment
    Service
    Deployment
    Deployment
    Ingress Ingress
    💻
    The user creates the cluster
    once.
    Then they can connect to it
    multiple times in the future.

    View Slide

  8. Service
    Worker Worker Worker
    Scheduler
    Jupyter
    Dask Helm Chart
    Deployment
    Service
    Deployment
    Deployment
    Ingress Ingress
    If the user disconnects the
    cluster still exists.

    View Slide

  9. dask-kubernetes
    A collection of cluster managers and utilities for Kubernetes
    Ephemeral

    View Slide

  10. KubeCluster()
    Spawns ephemeral clusters by requesting Pods
    directly via the Kubernetes API.

    View Slide

  11. View Slide

  12. Worker Worker Worker
    Scheduler
    KubeCluster
    Service
    Ingress
    🏽 💻
    Worker
    The user dynamically
    creates the cluster
    resources at runtime.

    View Slide

  13. Worker Worker Worker
    Scheduler
    Service
    Ingress
    Worker
    KubeCluster
    If the user disconnects.

    View Slide

  14. KubeCluster
    The cluster is garbage
    collected.

    View Slide

  15. HelmCluster()
    Connects to an existing Helm Chart deployment and
    provides the cluster manager interface including log
    retrieval and manual scaling.

    View Slide

  16. View Slide

  17. dask-gateway
    A central hub which launches Dask clusters on behalf of users.
    Can launch onto Kubernetes (and more).
    Ephemeral/Fixed

    View Slide

  18. Node
    Scheduler
    Worker
    Worker
    Worker

    Dask Gateway
    Ingress
    Worker
    Service
    Dask Gateway
    Dask Proxy
    Service
    Worker
    Scheduler
    Scheduler
    Worker
    Worker
    Node
    Node
    Node
    The user connects to the
    gateway and requests
    cluster resources.
    Dask gateway launches
    the cluster on their behalf
    and proxies traffic
    through.

    View Slide

  19. View Slide

  20. DaskHub Helm Chart
    JupyterHub and Dask Gateway packaged as a single Helm
    chart. Provides a central portal for launching Jupyter and Dask
    together.
    Ephemeral/Fixed

    View Slide

  21. Jupyter Hub
    Jupyter Proxy
    Jupyter
    Scheduler
    Scheduler Worker
    Worker
    Worker
    DaskHub Helm Chart
    💻
    Dask Gateway
    Service
    Service
    Ingress
    Jupyter Worker
    Service
    Lines omitted for clarity. Things
    got a bit crazy.
    Just imagine lines from
    basically everything to
    everything.
    Jupyter Auth
    Jupyter
    Spawner
    Database

    View Slide

  22. Deployment Workshop
    Stay tuned for
    Deploy JupyterHub with Dask
    Gateway on Kubernetes in 15 minutes
    Amit Kumar, Adam Lewis
    17:20 UTC

    View Slide

  23. Dask Deployments on Kubernetes
    Gateway
    Dask Helm Chart
    Fixed
    Deploys a Jupyter Server and Dask cluster via Kubernetes deployments.
    Can be manually scaled via kubectl/helm/dask_kubernetes.HelmCluster().
    https://github.com/dask/helm-chart
    dask_kubernetes.KubeCluster()
    Ephemeral
    Dynamically launch Dask clusters onto Kubernetes and scale adaptively.
    Gets garbage collected when idle.
    https://github.com/dask/dask-kubernetes
    Dask Gateway
    Ephemeral/Fixed/Centralized
    Central hub for spawning Dask Clusters. Great for teams and organizations
    who want to run many clusters for many users.
    https://gateway.dask.org/

    View Slide

  24. Deployment Workshop
    Thank you!
    @_jacobtomlinson

    View Slide