$30 off During Our Annual Pro Sale. View Details »

Deploying multi-GPU workloads on Kubernetes in Python

Deploying multi-GPU workloads on Kubernetes in Python

The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs with minimal code changes and no new tools to learn.

Dask provides advanced parallelism for Python by breaking functions into a task graph that can be evaluated by a task scheduler that has many workers.

By using Dask to scale out RAPIDS workloads on Kubernetes you can accelerate your workloads across many GPUs on many machines. In this talk we will discuss how to install and configure Dask on your Kubernetes cluster and use it to run accelerated GPU workloads on your cluster.

Jacob Tomlinson

February 02, 2023
Tweet

More Decks by Jacob Tomlinson

Other Decks in Technology

Transcript

  1. Deploying Multi-GPU workloads on
    Kubernetes in Python
    PyData DC - Feb 2023
    Jacob Tomlinson
    Software Engineering Lead
    NVIDIA

    View Slide

  2. 2
    Jake VanderPlas - PyCon 2017
    Jacob Tomlinson
    Former Research Software Engineer
    UK Met Office

    View Slide

  3. 3
    RAPIDS
    https://github.com/rapidsai
    Jacob Tomlinson
    Cloud Lead
    RAPIDS

    View Slide

  4. 4
    Minor Code Changes for Major Benefits
    Abstracting Accelerated Compute through Familiar Interfaces
    In [1]: import pandas as pd
    In [2]: df = pd.read_csv(‘filepath’)
    In [1]: from sklearn.ensemble import
    RandomForestClassifier
    In [2]: clf =
    RandomForestClassifier(n_estimators=10
    0,max_depth=8, random_state=0)
    In [3]: clf.fit(x, y)
    In [1]: import networkx as nx
    In [2]: page_rank=nx.pagerank(graph)
    In [1]: import cudf
    In [2]: df = cudf.read_csv(‘filepath’)
    In [1]: from cuml.ensemble import
    RandomForestClassifier
    In [2]: cuclf =
    RandomForestClassifier(n_estimators=10
    0,max_depth=8, random_state=0)
    In [3]: cuclf.fit(x, y)
    In [1]: import cugraph
    In [2]:
    page_rank=cugraph.pagerank(graph)
    GPU
    CPU
    pandas scikit-learn NetworkX
    cuDF cuML cuGraph
    Average Speed-Ups: 150x Average Speed-Ups: 250x
    Average Speed-Ups: 50x

    View Slide

  5. 5
    Lightning-Fast End-to-End Performance
    Reducing Data Science Processes from Hours to Seconds
    *CPU approximate to n1-highmem-8 (8 vCPUs, 52GB memory) on Google Cloud Platform. TCO calculations-based on Cloud instance costs.
    A100s Provide More Power
    than 100 CPU Nodes
    16
    More Cost-Effective than
    Similar CPU Configuration
    20x
    Faster Performance than
    Similar CPU Configuration
    70x

    View Slide

  6. General purpose Python library for parallelism
    Scales existing libraries, like Numpy, Pandas, and Scikit-Learn
    Flexible enough to build complex and custom systems
    Accessible for beginners, secure and trusted for institutions
    Jacob Tomlinson
    Core Developer
    Dask

    View Slide

  7. Dask accelerates the existing Python ecosystem
    Built alongside with the current community
    import numpy as np
    x = np.ones((1000, 1000))
    x + x.T - x.mean(axis=0
    import pandas as pd
    df = pd.read_csv(“file.csv”)
    df.groupby(“x”).y.mean()
    from scikit_learn.linear_model \
    import LogisticRegression
    lr = LogisticRegression()
    lr.fit(data, labels)
    Numpy Pandas Scikit-Learn

    View Slide

  8. 8
    Pre-Processing
    pandas
    Data Preparation Visualization
    Model Training
    Machine Learning
    scikit-learn
    Graph Analytics
    NetworkX
    Deep Learning
    TensorFlow, PyTorch,
    MxNet
    Visualization
    matplotlib
    Apache Spark / Dask
    CPU Memory
    Open Source Software Has Democratized Data Science
    Highly Accessible, Easy to Use Tools Abstract Complexity

    View Slide

  9. 9
    Accelerated Data Science with RAPIDS
    Powering Popular Data Science Ecosystems with NVIDIA GPUs
    Pre-Processing
    cuIO & cuDF
    Data Preparation Visualization
    Model Training
    Machine Learning
    cuML, XGBoost
    Graph Analytics
    cuGraph
    Deep Learning
    TensorFlow, PyTorch,
    MxNet
    Visualization
    cuXfilter, pyViz, Plotly
    Dask
    GPU Memory
    Spark / Dask

    View Slide

  10. 10
    XGBoost + RAPIDS: Better Together
    ● RAPIDS comes paired with XGBoost 1.6.0
    ● XGBoost provides zero-copy data import
    from cuDF, CuPy, Numba, PyTorch and
    more
    ● Official Dask API makes it easy to scale to
    multiple nodes or multiple GPUs
    ● GPU tree builder delivers huge perf gains
    ● Now supports Learning to Rank, categorical
    variables, and SHAP Explainability
    ● Use models directly in Triton for
    high-performance inference
    “XGBoost is All You Need” – Bojan Tunguz, 4x Kaggle Grandmaster
    All RAPIDS changes are integrated upstream and
    provided to all XGBoost users – via pypi or RAPIDS
    conda

    View Slide

  11. 11
    Deploying RAPIDS on the cloud

    View Slide

  12. 12
    RAPIDS in the Cloud
    Current Focus Areas
    • Kubernetes
    • Helm Charts
    • Operator
    • Kubeflow
    • Cloud ML Platforms
    • Amazon Sagemaker Studio
    • Google Vertex AI
    • Cloud Compute
    • Amazon EC2, ECS, Fargate, EKS
    • Google Compute Engine, Dataproc, GKE
    • Cloud ML examples gallery
    New Deployment documentation website
    Deployment Documentation: docs.rapids.ai/deployment/stable
    Kubernetes Deployment: docs.rapids.ai/deployment/stable/platforms/kubernetes.html
    Dask Kubernetes: kubernetes.dask.org

    View Slide

  13. 13
    RAPIDS on Kubernetes
    Unified Cloud Deployments
    GPU
    Operator
    Kubernetes
    GPU
    GPU
    GPU
    GPU
    GPU
    GPU
    GPU
    GPU

    View Slide

  14. 14
    Live Demo
    Murphy's First Law: Anything that can go wrong will go wrong.
    Murphy's Second Law: Nothing is as easy as it looks.
    Murphy's Third Law: Everything takes longer than you think it will.

    View Slide

  15. 15
    Launch a Kubernetes Cluster
    # Launch a Kubernetes Cluster with GPUs
    $ gcloud container clusters create jtomlinson-rapids-demo \
    --accelerator type=nvidia-tesla-a100,count=2 \
    --machine-type a2-highgpu-2g \
    --zone us-central1-c

    View Slide

  16. 16
    Install NVIDIA Drivers
    # Install the NVIDIA Drivers
    $ kubectl apply -f
    https://raw.githubusercontent.com/GoogleCloudPlatform/contain
    er-engine-accelerators/master/nvidia-driver-installer/cos/dae
    monset-preloaded-latest.yaml

    View Slide

  17. 17
    Install the Dask operator
    # Install the Dask Operator
    $ helm install --repo https://helm.dask.org \
    --create-namespace -n dask-operator \
    --generate-name dask-kubernetes-operator

    View Slide

  18. 18
    Installing the operator
    # Check that we can list daskcluster resources
    $ kubectl get daskclusters
    No resources found in default namespace.
    # Check that the operator pod is running
    $ kubectl get pods -A -l application=dask-kubernetes-operator
    NAMESPACE NAME READY STATUS RESTARTS AGE
    dask-operator dask-kubernetes-operator-775b8bbbd5-zdrf7 1/1 Running 0 74s
    # 🚀 done!

    View Slide

  19. 19
    Get a Jupyter notebook
    # Create a notebook Pod for us to drive the workload from
    $ kubectl apply -f notebook.yaml
    Source for notebook.yaml https://gist.github.com/jacobtomlinson/397b277e6cc4b717d9ff04759f350b4a#file-notebook-yaml

    View Slide

  20. 20
    Create RAPIDS Clusters within Notebooks
    With on prem or cloud-managed Kubernetes
    # Install dask-kubernetes
    $ pip install dask-kubernetes
    # Launch a cluster
    >>> from dask_kubernetes.operator \
    import KubeCluster
    >>> cluster = KubeCluster(name="demo")
    # List the DaskCluster custom resource that was created
    for us under the hood
    $ kubectl get daskclusters
    NAME AGE
    demo-cluster 6m3s

    View Slide

  21. 21
    # cluster.yaml
    apiVersion: kubernetes.dask.org/v1
    kind: DaskCluster
    metadata:
    name: simple-cluster
    spec:
    worker:
    replicas: 3
    spec:
    containers:
    - name: worker
    image: "ghcr.io/dask/dask:latest"
    imagePullPolicy: "IfNotPresent"
    args:
    - dask-worker
    - --name
    - $(DASK_WORKER_NAME)
    scheduler:
    spec:
    containers:
    - name: scheduler
    image: "ghcr.io/dask/dask:latest"
    imagePullPolicy: "IfNotPresent"
    args:
    - dask-scheduler
    ports:
    - name: tcp-comm
    containerPort: 8786
    protocol: TCP
    - name: http-dashboard
    containerPort: 8787
    protocol: TCP
    readinessProbe:
    httpGet:
    port: http-dashboard
    path: /health
    initialDelaySeconds: 5

    The Dask Operator has three custom
    resource types that you can create via
    kubectl.
    ● DaskCluster to create whole
    clusters.
    ● DaskWorkerGroup to create
    additional groups of workers with
    various configurations (high
    memory, GPUs, etc).
    ● DaskJob to run end-to-end tasks
    like a Kubernetes Job but with an
    adjacent Dask Cluster.
    Create RAPIDS Clusters
    with kubectl

    View Slide

  22. 22
    Workload demo

    View Slide

  23. 23
    Typical ML workflows

    View Slide

  24. 24
    Typical ML workflows

    View Slide

  25. 25
    GCP T4 Instance
    Parallel HPO
    Computational Parallelism Beyond a Single Node
    X, y = … # NumPy Arrays
    # Optimize in parallel on your Dask cluster
    with parallel_backend("dask"):
    study.optimize(lambda trial: objective(trial, X, y),
    n_trials=100,
    n_jobs=4) # NGPUs on system
    GPU
    cuda-worker
    GPU
    cuda-worker
    GPU
    cuda-worker
    GPU
    cuda-worker
    LocalCUDA
    cluster
    GKE Cluster with GPU Pods
    GPU
    cuda-worker
    GPU
    cuda-worker
    GPU
    cuda-worker
    KubeCluster


    X, y = … # NumPy Arrays
    # Optimize in parallel on your Dask cluster
    with parallel_backend("dask"):
    study.optimize(lambda trial: objective(trial, X, y),
    n_trials=100,
    n_jobs=20) # NGPUs on K8s cluster

    View Slide

  26. 26
    Example Notebook
    github.com/rapidsai/cloud-ml-examples

    View Slide

  27. 27

    View Slide

  28. 28
    Wrap up

    View Slide

  29. 29
    RAPIDS Community
    Join us
    OPEN SOURCE
    CONTRIBUTORS
    ADOPTERS

    View Slide

  30. 30
    How to Get Started with RAPIDS
    A Variety of Ways to Get Up & Running
    More about RAPIDS Self-Start Resources Discussion & Support
    ● Learn more at RAPIDS.ai
    ● Read the API docs
    ● Check out the RAPIDS blog
    ● Read the NVIDIA DevBlog
    ● Get started with RAPIDS
    ● Deploy on the Cloud today
    ● Start with Google Colab
    ● Look at the cheat sheets
    ● Check the RAPIDS GitHub
    ● Use the NVIDIA Forums
    ● Reach out on Slack
    ● Talk to NVIDIA Services
    @RAPIDSai https://github.com/rapidsai https://rapids-goai.slack.com/join https://rapids.ai
    Get Engaged

    View Slide

  31. THANK YOU
    Jacob Tomlinson
    [email protected]
    @_jacobtomlinson

    View Slide