k8s-batch-sig - Dask on Kubernetes

Slide 1

Slide 1 text

Overview and Kubernetes integration Jacob Tomlinson Dask developer Senior Software Engineer at NVIDIA

Slide 2

Slide 2 text

Jacob Tomlinson Senior Software Engineer NVIDIA

Slide 3

Slide 3 text

Dask’s Features Overview

Slide 4

Slide 4 text

General purpose Python library for parallelism Scales existing libraries, like Numpy, Pandas, and Scikit-Learn Flexible enough to build complex and custom systems Accessible for beginners, secure and trusted for institutions

Slide 5

Slide 5 text

PyData Community adoption “Once Dask was working properly with NumPy, it became clear that there was huge demand for a lightweight parallelism solution for Pandas DataFrames and machine learning tools, such as Scikit-Learn. Dask then evolved quickly to support these other projects where appropriate.” Matthew Rocklin Dask Creator Source https://coiled.io/blog/history-dask/ Image from Jake VanderPlas’ keynote, PyCon 2017

Slide 6

Slide 6 text

Deferring Python execution import dask @dask.delayed def inc(x): return x + 1 @dask.delayed def double(x): return x * 2 @dask.delayed def add(x, y): return x + y data = [1, 2, 3, 4, 5] output = [] for x in data: a = inc(x) b = double(x) c = add(a, b) output.append(c) total = dask.delayed(sum)(output) Dask allows users to construct custom graphs with the delayed and futures APIs.

Slide 7

Slide 7 text

Distributed Task Graphs Constructing tasks in a DAG allows tasks to executed by a selection of schedulers. The distributed scheduler allows a DAG to be shared by many workers running over many machines to spread out work.

Slide 8

Slide 8 text

Out-of-core computation Dask’s data structures are chunked or partitioned allowing them to be swapped in and out of memory. Operations run on chunks independently and only communicate intermediate results when necessary

Slide 9

Slide 9 text

Dask’s distributed scheduler “For the first year of Dask’s life it was focused on single-machine parallelism. But inevitably, Dask was used on problems that didn’t fit on a single machine. This led us to develop a distributed-memory scheduler for Dask that supported the same API as the existing single-machine scheduler. For Dask users this was like magic. Suddenly their existing workloads on 50GB datasets could run comfortably on 5TB (and then 50TB a bit later).” Matthew Rocklin Dask Creator Source https://coiled.io/blog/history-dask/

Slide 10

Slide 10 text

Scheduler Dashboard # Connect a Dask client >>> from dask.distributed import Client >>> client = Client(cluster) # Do come computation >>> import dask.array as da >>> arr = da.random.random((10_000, 1_000, 1_000), chunks=(1000, 1000, 100)) >>> result = arr.mean().compute()

Slide 11

Slide 11 text

Dashboard Dask’s dashboard gives you key insights into how your cluster is performing. You can view it in a browser or directly within Jupyter Lab to see how your graphs are executing. You can also use the built in profiler to understand where the slow parts of your code are.

Slide 12

Slide 12 text

Elastic scaling Dask’s adaptive scaling allows a Dask scheduler to request additional workers via whatever resource manager you are using (Kubernetes, Cloud, etc). This allows computations to burst out onto more machines and complete the overall graph in less time. This is particularly effective when you have multiple people running interactive and embarrassingly parallel workloads on shared resources.

Slide 13

Slide 13 text

Dask accelerates the existing Python ecosystem Built alongside with the current community import numpy as np x = np.ones((1000, 1000)) x + x.T - x.mean(axis=0 import pandas as pd df = pd.read_csv(“file.csv”) df.groupby(“x”).y.mean() from scikit_learn.linear_model \ import LogisticRegression lr = LogisticRegression() lr.fit(data, labels) Numpy Pandas Scikit-Learn

Slide 14

Slide 14 text

14 RAPIDS https://github.com/rapidsai Jacob Tomlinson Cloud Deployment Lead RAPIDS

Slide 15

Slide 15 text

15 Minor Code Changes for Major Benefits Abstracting Accelerated Compute through Familiar Interfaces In [1]: import pandas as pd In [2]: df = pd.read_csv(‘filepath’) In [1]: from sklearn.ensemble import RandomForestClassifier In [2]: clf = RandomForestClassifier(n_estimators=10 0,max_depth=8, random_state=0) In [3]: clf.fit(x, y) In [1]: import networkx as nx In [2]: page_rank=nx.pagerank(graph) In [1]: import cudf In [2]: df = cudf.read_csv(‘filepath’) In [1]: from cuml.ensemble import RandomForestClassifier In [2]: cuclf = RandomForestClassifier(n_estimators=10 0,max_depth=8, random_state=0) In [3]: cuclf.fit(x, y) In [1]: import cugraph In [2]: page_rank=cugraph.pagerank(graph) GPU CPU pandas scikit-learn NetworkX cuDF cuML cuGraph Average Speed-Ups: 150x Average Speed-Ups: 250x Average Speed-Ups: 50x

Slide 16

Slide 16 text

16 Lightning-Fast End-to-End Performance Reducing Data Science Processes from Hours to Seconds *CPU approximate to n1-highmem-8 (8 vCPUs, 52GB memory) on Google Cloud Platform. TCO calculations-based on Cloud instance costs. A100s Provide More Power than 100 CPU Nodes 16 More Cost-Effective than Similar CPU Configuration 20x Faster Performance than Similar CPU Configuration 70x

Slide 17

Slide 17 text

17 RAPIDS on Kubernetes Unified Cloud Deployments GPU Operator Kubernetes GPU GPU GPU GPU GPU GPU GPU GPU

Slide 18

Slide 18 text

Deploying Dask A brief history and some context

Slide 19

Slide 19 text

19 19 19 Creating a Cluster manually

Slide 20

Slide 20 text

20 LocalCluster ● Convenience class to create subprocesses ● Inspects local system and creates workers to maximise hardware use ● Has helper methods for managing the cluster

Slide 21

Slide 21 text

21 dask-jobqueue ● Convenience class to create HPC Dask Clusters ● Intended to be used from the head node of an HPC ● Scheduler runs in subprocess on the head node ● Workers are submitted as HPC jobs to the queue ● Assumes network connectivity between all nodes and head node

Slide 22

Slide 22 text

22 dask-kubernetes (classic) ● Convenience class to create Kubernetes Dask Clusters ● Intended to be used from within the Kubernetes cluster ● Scheduler runs as subprocess in user Pod ● Workers are created as Pods (via service account auth) ● Assumes network connectivity between all Pod IPs

Slide 23

Slide 23 text

23 23 23 Helm Chart ● Chart deploys a Dask Cluster and a Jupyter service ● Scheduler, Workers and Jupyter are all Deployments ● Jupyter is preconfigured to connect to the Dask cluster ● Dask worker Deployment presents a scaling challenge due to semi-stateful nature of Dask Workers

Slide 24

Slide 24 text

24 24 24 dask-gateway ● Dask cluster provisioning service ● Has multiple backends including HPC, Kubernetes and Hadoop ● All Dask traffic is proxied via a single ingress ● Users are abstracted away front he underlying platform

Slide 25

Slide 25 text

Dask Operator Kubernetes Native

Slide 26

Slide 26 text

26 26 26 Built with kopf Dask is a Python community so it made sense to build the controller in Python too. We also evaluated the Operator Framework for Golang but using it would hugely reduce the number of active Dask maintainers who could contribute.

Slide 27

Slide 27 text

27 # cluster.yaml apiVersion: kubernetes.dask.org/v1 kind: DaskCluster metadata: name: simple-cluster spec: worker: replicas: 3 spec: containers: - name: worker image: "ghcr.io/dask/dask:latest" imagePullPolicy: "IfNotPresent" args: - dask-worker - --name - $(DASK_WORKER_NAME) scheduler: spec: containers: - name: scheduler image: "ghcr.io/dask/dask:latest" imagePullPolicy: "IfNotPresent" args: - dask-scheduler ports: - name: tcp-comm containerPort: 8786 protocol: TCP - name: http-dashboard containerPort: 8787 protocol: TCP readinessProbe: httpGet: port: http-dashboard path: /health initialDelaySeconds: 5 … The Dask Operator has four custom resource types that you can create via kubectl. ● DaskCluster to create whole clusters. ● DaskWorkerGroup to create additional groups of workers with various configurations (high memory, GPUs, etc). ● DaskJob to run end-to-end tasks like a Kubernetes Job but with an adjacent DaskCluster. ● DaskAutoscaler behaves like an HPA but interacts with the Dask scheduler to make scaling decisions Create Dask Clusters with kubectl

Slide 28

Slide 28 text

28 DaskJob ● Inspired by Kubeflow PyTorchJob, et al ● DaskJob contains a Pod spec to run the workload and a nested DaskCluster resource ● Workload Pod is pre configured to connect to the DaskCluster ● Users can submit a batch job with attached autoscaling Dask Cluster via kubectl

Slide 29

Slide 29 text

Create Dask Clusters with Python # Install dask-kubernetes $ pip install dask-kubernetes # Launch a cluster >>> from dask_kubernetes.operator \ import KubeCluster >>> cluster = KubeCluster(name="demo") # List the DaskCluster custom resource that was created for us under the hood $ kubectl get daskclusters NAME AGE demo 6m3s

Slide 30

Slide 30 text

Flyte Integration success

Slide 31

Slide 31 text

31 31 31

Slide 32

Slide 32 text

32 32 32

Slide 33

Slide 33 text

Read Documentation: docs.dask.org See Examples: examples.dask.org Engage Community: github.com/dask