Deploying multi-GPU workloads on Kubernetes in Python

Deploying Multi-GPU workloads on Kubernetes in Python PyData DC -
Feb 2023 Jacob Tomlinson Software Engineering Lead NVIDIA

2 Jake VanderPlas - PyCon 2017 Jacob Tomlinson Former Research
Software Engineer UK Met Office

3 RAPIDS https://github.com/rapidsai Jacob Tomlinson Cloud Lead RAPIDS

4 Minor Code Changes for Major Benefits Abstracting Accelerated Compute
through Familiar Interfaces In [1]: import pandas as pd In [2]: df = pd.read_csv(‘filepath’) In [1]: from sklearn.ensemble import RandomForestClassifier In [2]: clf = RandomForestClassifier(n_estimators=10 0,max_depth=8, random_state=0) In [3]: clf.fit(x, y) In [1]: import networkx as nx In [2]: page_rank=nx.pagerank(graph) In [1]: import cudf In [2]: df = cudf.read_csv(‘filepath’) In [1]: from cuml.ensemble import RandomForestClassifier In [2]: cuclf = RandomForestClassifier(n_estimators=10 0,max_depth=8, random_state=0) In [3]: cuclf.fit(x, y) In [1]: import cugraph In [2]: page_rank=cugraph.pagerank(graph) GPU CPU pandas scikit-learn NetworkX cuDF cuML cuGraph Average Speed-Ups: 150x Average Speed-Ups: 250x Average Speed-Ups: 50x

5 Lightning-Fast End-to-End Performance Reducing Data Science Processes from Hours
to Seconds *CPU approximate to n1-highmem-8 (8 vCPUs, 52GB memory) on Google Cloud Platform. TCO calculations-based on Cloud instance costs. A100s Provide More Power than 100 CPU Nodes 16 More Cost-Effective than Similar CPU Configuration 20x Faster Performance than Similar CPU Configuration 70x

General purpose Python library for parallelism Scales existing libraries, like
Numpy, Pandas, and Scikit-Learn Flexible enough to build complex and custom systems Accessible for beginners, secure and trusted for institutions Jacob Tomlinson Core Developer Dask

Dask accelerates the existing Python ecosystem Built alongside with the
current community import numpy as np x = np.ones((1000, 1000)) x + x.T - x.mean(axis=0 import pandas as pd df = pd.read_csv(“file.csv”) df.groupby(“x”).y.mean() from scikit_learn.linear_model \ import LogisticRegression lr = LogisticRegression() lr.fit(data, labels) Numpy Pandas Scikit-Learn

8 Pre-Processing pandas Data Preparation Visualization Model Training Machine Learning
scikit-learn Graph Analytics NetworkX Deep Learning TensorFlow, PyTorch, MxNet Visualization matplotlib Apache Spark / Dask CPU Memory Open Source Software Has Democratized Data Science Highly Accessible, Easy to Use Tools Abstract Complexity

9 Accelerated Data Science with RAPIDS Powering Popular Data Science
Ecosystems with NVIDIA GPUs Pre-Processing cuIO & cuDF Data Preparation Visualization Model Training Machine Learning cuML, XGBoost Graph Analytics cuGraph Deep Learning TensorFlow, PyTorch, MxNet Visualization cuXfilter, pyViz, Plotly Dask GPU Memory Spark / Dask

10 XGBoost + RAPIDS: Better Together • RAPIDS comes paired
with XGBoost 1.6.0 • XGBoost provides zero-copy data import from cuDF, CuPy, Numba, PyTorch and more • Official Dask API makes it easy to scale to multiple nodes or multiple GPUs • GPU tree builder delivers huge perf gains • Now supports Learning to Rank, categorical variables, and SHAP Explainability • Use models directly in Triton for high-performance inference “XGBoost is All You Need” – Bojan Tunguz, 4x Kaggle Grandmaster All RAPIDS changes are integrated upstream and provided to all XGBoost users – via pypi or RAPIDS conda

11 Deploying RAPIDS on the cloud

12 RAPIDS in the Cloud Current Focus Areas • Kubernetes
• Helm Charts • Operator • Kubeflow • Cloud ML Platforms • Amazon Sagemaker Studio • Google Vertex AI • Cloud Compute • Amazon EC2, ECS, Fargate, EKS • Google Compute Engine, Dataproc, GKE • Cloud ML examples gallery New Deployment documentation website Deployment Documentation: docs.rapids.ai/deployment/stable Kubernetes Deployment: docs.rapids.ai/deployment/stable/platforms/kubernetes.html Dask Kubernetes: kubernetes.dask.org

13 RAPIDS on Kubernetes Unified Cloud Deployments GPU Operator Kubernetes
GPU GPU GPU GPU GPU GPU GPU GPU

14 Live Demo Murphy's First Law: Anything that can go
wrong will go wrong. Murphy's Second Law: Nothing is as easy as it looks. Murphy's Third Law: Everything takes longer than you think it will.

15 Launch a Kubernetes Cluster # Launch a Kubernetes Cluster
with GPUs $ gcloud container clusters create jtomlinson-rapids-demo \ --accelerator type=nvidia-tesla-a100,count=2 \ --machine-type a2-highgpu-2g \ --zone us-central1-c

16 Install NVIDIA Drivers # Install the NVIDIA Drivers $
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/contain er-engine-accelerators/master/nvidia-driver-installer/cos/dae monset-preloaded-latest.yaml

17 Install the Dask operator # Install the Dask Operator
$ helm install --repo https://helm.dask.org \ --create-namespace -n dask-operator \ --generate-name dask-kubernetes-operator

18 Installing the operator # Check that we can list
daskcluster resources $ kubectl get daskclusters No resources found in default namespace. # Check that the operator pod is running $ kubectl get pods -A -l application=dask-kubernetes-operator NAMESPACE NAME READY STATUS RESTARTS AGE dask-operator dask-kubernetes-operator-775b8bbbd5-zdrf7 1/1 Running 0 74s # 🚀 done!

19 Get a Jupyter notebook # Create a notebook Pod
for us to drive the workload from $ kubectl apply -f notebook.yaml Source for notebook.yaml https://gist.github.com/jacobtomlinson/397b277e6cc4b717d9ff04759f350b4a#file-notebook-yaml

20 Create RAPIDS Clusters within Notebooks With on prem or
cloud-managed Kubernetes # Install dask-kubernetes $ pip install dask-kubernetes # Launch a cluster >>> from dask_kubernetes.operator \ import KubeCluster >>> cluster = KubeCluster(name="demo") # List the DaskCluster custom resource that was created for us under the hood $ kubectl get daskclusters NAME AGE demo-cluster 6m3s

21 # cluster.yaml apiVersion: kubernetes.dask.org/v1 kind: DaskCluster metadata: name: simple-cluster
spec: worker: replicas: 3 spec: containers: - name: worker image: "ghcr.io/dask/dask:latest" imagePullPolicy: "IfNotPresent" args: - dask-worker - --name - $(DASK_WORKER_NAME) scheduler: spec: containers: - name: scheduler image: "ghcr.io/dask/dask:latest" imagePullPolicy: "IfNotPresent" args: - dask-scheduler ports: - name: tcp-comm containerPort: 8786 protocol: TCP - name: http-dashboard containerPort: 8787 protocol: TCP readinessProbe: httpGet: port: http-dashboard path: /health initialDelaySeconds: 5 … The Dask Operator has three custom resource types that you can create via kubectl. • DaskCluster to create whole clusters. • DaskWorkerGroup to create additional groups of workers with various configurations (high memory, GPUs, etc). • DaskJob to run end-to-end tasks like a Kubernetes Job but with an adjacent Dask Cluster. Create RAPIDS Clusters with kubectl

22 Workload demo

23 Typical ML workflows

24 Typical ML workflows

25 GCP T4 Instance Parallel HPO Computational Parallelism Beyond a
Single Node X, y = … # NumPy Arrays # Optimize in parallel on your Dask cluster with parallel_backend("dask"): study.optimize(lambda trial: objective(trial, X, y), n_trials=100, n_jobs=4) # NGPUs on system GPU cuda-worker GPU cuda-worker GPU cuda-worker GPU cuda-worker LocalCUDA cluster GKE Cluster with GPU Pods GPU cuda-worker GPU cuda-worker GPU cuda-worker KubeCluster … … X, y = … # NumPy Arrays # Optimize in parallel on your Dask cluster with parallel_backend("dask"): study.optimize(lambda trial: objective(trial, X, y), n_trials=100, n_jobs=20) # NGPUs on K8s cluster

26 Example Notebook github.com/rapidsai/cloud-ml-examples

28 Wrap up

29 RAPIDS Community Join us OPEN SOURCE CONTRIBUTORS ADOPTERS

30 How to Get Started with RAPIDS A Variety of
Ways to Get Up & Running More about RAPIDS Self-Start Resources Discussion & Support • Learn more at RAPIDS.ai • Read the API docs • Check out the RAPIDS blog • Read the NVIDIA DevBlog • Get started with RAPIDS • Deploy on the Cloud today • Start with Google Colab • Look at the cheat sheets • Check the RAPIDS GitHub • Use the NVIDIA Forums • Reach out on Slack • Talk to NVIDIA Services @RAPIDSai https://github.com/rapidsai https://rapids-goai.slack.com/join https://rapids.ai Get Engaged

THANK YOU Jacob Tomlinson jtomlinson@nvidia.com @_jacobtomlinson

Deploying multi-GPU workloads on Kubernetes in ...

Deploying multi-GPU workloads on Kubernetes in Python

Jacob Tomlinson

More Decks by Jacob Tomlinson

Other Decks in Technology

Featured

Transcript