Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deploying multi-GPU workloads on Kubernetes in Python

Deploying multi-GPU workloads on Kubernetes in Python

By using Dask to scale out RAPIDS workloads on Kubernetes you can accelerate your workloads across many GPUs on many machines. In this talk, we will discuss how to install and configure Dask on your Kubernetes cluster and use it to run accelerated GPU workloads on your cluster.

The RAPIDS suite of open-source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs with minimal code changes and no new tools to learn.

Dask is an open-source library which provides advanced parallelism for Python by breaking functions into a task graph that can be evaluated by a task scheduler that has many workers.

By using Dask to scale out RAPIDS workloads on Kubernetes you can accelerate your workloads across many GPUs on many machines. In this talk, we will discuss how to install and configure Dask on your Kubernetes cluster and use it to run accelerated GPU workloads on your cluster.

Jacob Tomlinson

August 17, 2023
Tweet

More Decks by Jacob Tomlinson

Other Decks in Technology

Transcript

  1. Deploying Multi-GPU workloads on
    Kubernetes in Python
    EuroSciPy - Aug 2023
    Jacob Tomlinson
    RAPIDS Cloud Deployment Lead
    NVIDIA

    View full-size slide

  2. 3
    RAPIDS
    https://github.com/rapidsai
    Jacob Tomlinson
    Cloud Deployment Lead
    RAPIDS

    View full-size slide

  3. 4
    Minor Code Changes for Major Benefits
    Abstracting Accelerated Compute through Familiar Interfaces
    In [1]: import pandas as pd
    In [2]: df = pd.read_csv(‘filepath’)
    In [1]: from sklearn.ensemble import
    RandomForestClassifier
    In [2]: clf =
    RandomForestClassifier(n_estimators=10
    0,max_depth=8, random_state=0)
    In [3]: clf.fit(x, y)
    In [1]: import networkx as nx
    In [2]: page_rank=nx.pagerank(graph)
    In [1]: import cudf
    In [2]: df = cudf.read_csv(‘filepath’)
    In [1]: from cuml.ensemble import
    RandomForestClassifier
    In [2]: cuclf =
    RandomForestClassifier(n_estimators=10
    0,max_depth=8, random_state=0)
    In [3]: cuclf.fit(x, y)
    In [1]: import cugraph
    In [2]:
    page_rank=cugraph.pagerank(graph)
    GPU
    CPU
    pandas scikit-learn NetworkX
    cuDF cuML cuGraph
    Average Speed-Ups: 150x Average Speed-Ups: 250x
    Average Speed-Ups: 50x

    View full-size slide

  4. 5
    Lightning-Fast End-to-End Performance
    Reducing Data Science Processes from Hours to Seconds
    *CPU approximate to n1-highmem-8 (8 vCPUs, 52GB memory) on Google Cloud Platform. TCO calculations-based on Cloud instance costs.
    A100s Provide More Power
    than 100 CPU Nodes
    16
    More Cost-Effective than
    Similar CPU Configuration
    20x
    Faster Performance than
    Similar CPU Configuration
    70x

    View full-size slide

  5. General purpose Python library for parallelism
    Scales existing libraries, like Numpy, Pandas, and Scikit-Learn
    Flexible enough to build complex and custom systems
    Accessible for beginners, secure and trusted for institutions
    Jacob Tomlinson
    Core Developer
    Dask

    View full-size slide

  6. Dask accelerates the existing Python ecosystem
    Built alongside with the current community
    import numpy as np
    x = np.ones((1000, 1000))
    x + x.T - x.mean(axis=0
    import pandas as pd
    df = pd.read_csv(“file.csv”)
    df.groupby(“x”).y.mean()
    from scikit_learn.linear_model \
    import LogisticRegression
    lr = LogisticRegression()
    lr.fit(data, labels)
    Numpy Pandas Scikit-Learn

    View full-size slide

  7. 8
    Pre-Processing
    pandas
    Data Preparation Visualization
    Model Training
    Machine Learning
    scikit-learn
    Graph Analytics
    NetworkX
    Deep Learning
    TensorFlow, PyTorch,
    MxNet
    Visualization
    matplotlib
    Apache Spark / Dask
    CPU Memory
    Open Source Software Has Democratized Data Science
    Highly Accessible, Easy to Use Tools Abstract Complexity

    View full-size slide

  8. 9
    Accelerated Data Science with RAPIDS
    Powering Popular Data Science Ecosystems with NVIDIA GPUs
    Pre-Processing
    cuIO & cuDF
    Data Preparation Visualization
    Model Training
    Machine Learning
    cuML, XGBoost
    Graph Analytics
    cuGraph
    Deep Learning
    TensorFlow, PyTorch,
    MxNet
    Visualization
    cuXfilter, pyViz, Plotly
    Dask
    GPU Memory
    Spark / Dask

    View full-size slide

  9. 10
    XGBoost + RAPIDS: Better Together
    ● RAPIDS comes paired with XGBoost 1.6.0
    ● XGBoost provides zero-copy data import
    from cuDF, CuPy, Numba, PyTorch and
    more
    ● Official Dask API makes it easy to scale to
    multiple nodes or multiple GPUs
    ● GPU tree builder delivers huge perf gains
    ● Now supports Learning to Rank, categorical
    variables, and SHAP Explainability
    ● Use models directly in Triton for
    high-performance inference
    “XGBoost is All You Need” – Bojan Tunguz, 4x Kaggle Grandmaster
    All RAPIDS changes are integrated upstream and
    provided to all XGBoost users – via pypi or RAPIDS
    conda

    View full-size slide

  10. 11
    Deploying RAPIDS on the cloud

    View full-size slide

  11. 12
    RAPIDS in the Cloud
    Current Focus Areas
    • NVIDIA DGX Cloud
    • Kubernetes
    • Helm Charts
    • Operator
    • Kubeflow
    • Cloud ML Platforms
    • Amazon Sagemaker Studio
    • Google Vertex AI
    • Cloud Compute
    • Amazon EC2, ECS, Fargate, EKS
    • Google Compute Engine, Dataproc, GKE
    • Cloud ML examples gallery
    New Deployment documentation website
    Deployment Documentation: docs.rapids.ai/deployment/stable
    Kubernetes Deployment: docs.rapids.ai/deployment/stable/platforms/kubernetes.html
    Dask Kubernetes: kubernetes.dask.org

    View full-size slide

  12. 13
    RAPIDS on Kubernetes
    Unified Cloud Deployments
    GPU
    Operator
    Kubernetes
    GPU
    GPU
    GPU
    GPU
    GPU
    GPU
    GPU
    GPU

    View full-size slide

  13. 14
    Live Demo
    Murphy's First Law: Anything that can go wrong will go wrong.
    Murphy's Second Law: Nothing is as easy as it looks.
    Murphy's Third Law: Everything takes longer than you think it will.

    View full-size slide

  14. 15
    Launch a Kubernetes Cluster
    # Launch a Kubernetes Cluster with GPUs
    $ gcloud container clusters create jtomlinson-rapids-demo \
    --accelerator type=nvidia-tesla-a100,count=2 \
    --machine-type a2-highgpu-2g \
    --zone us-central1-c

    View full-size slide

  15. 16
    Install NVIDIA Drivers
    # Install the NVIDIA Drivers
    $ kubectl apply -f
    https://raw.githubusercontent.com/GoogleCloudPlatform/contain
    er-engine-accelerators/master/nvidia-driver-installer/cos/dae
    monset-preloaded-latest.yaml

    View full-size slide

  16. 17
    Install the Dask operator
    # Install the Dask Operator
    $ helm install --repo https://helm.dask.org \
    --create-namespace -n dask-operator \
    --generate-name dask-kubernetes-operator

    View full-size slide

  17. 18
    Installing the operator
    # Check that we can list daskcluster resources
    $ kubectl get daskclusters
    No resources found in default namespace.
    # Check that the operator pod is running
    $ kubectl get pods -A -l application=dask-kubernetes-operator
    NAMESPACE NAME READY STATUS RESTARTS AGE
    dask-operator dask-kubernetes-operator-775b8bbbd5-zdrf7 1/1 Running 0 74s
    # 🚀 done!

    View full-size slide

  18. 19
    # cluster.yaml
    apiVersion: kubernetes.dask.org/v1
    kind: DaskCluster
    metadata:
    name: simple-cluster
    spec:
    worker:
    replicas: 3
    spec:
    containers:
    - name: worker
    image: "ghcr.io/dask/dask:latest"
    imagePullPolicy: "IfNotPresent"
    args:
    - dask-worker
    - --name
    - $(DASK_WORKER_NAME)
    scheduler:
    spec:
    containers:
    - name: scheduler
    image: "ghcr.io/dask/dask:latest"
    imagePullPolicy: "IfNotPresent"
    args:
    - dask-scheduler
    ports:
    - name: tcp-comm
    containerPort: 8786
    protocol: TCP
    - name: http-dashboard
    containerPort: 8787
    protocol: TCP
    readinessProbe:
    httpGet:
    port: http-dashboard
    path: /health
    initialDelaySeconds: 5

    The Dask Operator has some custom
    resource types that you can create via
    kubectl. e.g
    ● DaskCluster to create whole
    clusters.
    ● DaskWorkerGroup to create
    additional groups of workers with
    various configurations (high
    memory, GPUs, etc).
    ● DaskJob to run end-to-end tasks
    like a Kubernetes Job but with an
    adjacent Dask Cluster.
    Create RAPIDS Clusters
    with kubectl
    Tip: Use dask kubernetes gen cluster to generate this YAML for you

    View full-size slide

  19. 20
    Create RAPIDS Clusters within Notebooks
    With on prem or cloud-managed Kubernetes
    # Install dask-kubernetes
    $ pip install dask-kubernetes
    # Launch a cluster
    >>> from dask_kubernetes.operator \
    import KubeCluster
    >>> cluster = KubeCluster(name="rapids")
    # List the DaskCluster custom resource that was created
    for us under the hood
    $ kubectl get daskclusters
    NAME AGE
    rapids 6m3s

    View full-size slide

  20. 21
    Get a Jupyter notebook on your Dask cluster
    # Create a cluster with Jupyter running alongside the scheduler
    $ dask kubernetes gen cluster \
    --name rapids \
    --image rapidsai/notebooks:23.08-cuda12.0-py3.10 \
    --worker-command dask-cuda-worker \
    --resources='{"limits": {"nvidia.com/gpu": "1"}}' \
    --jupyter \
    | kubectl apply -f -

    View full-size slide

  21. 22
    Workload demo

    View full-size slide

  22. 23
    Typical ML workflows

    View full-size slide

  23. 24
    Typical ML workflows

    View full-size slide

  24. 25
    GCP T4 Instance
    Parallel HPO
    Computational Parallelism Beyond a Single Node
    X, y = … # NumPy Arrays
    # Optimize in parallel on your Dask cluster
    with parallel_backend("dask"):
    study.optimize(lambda trial: objective(trial, X, y),
    n_trials=100,
    n_jobs=4) # NGPUs on system
    GPU
    cuda-worker
    GPU
    cuda-worker
    GPU
    cuda-worker
    GPU
    cuda-worker
    LocalCUDA
    cluster
    GKE Cluster with GPU Pods
    GPU
    cuda-worker
    GPU
    cuda-worker
    GPU
    cuda-worker
    KubeCluster


    X, y = … # NumPy Arrays
    # Optimize in parallel on your Dask cluster
    with parallel_backend("dask"):
    study.optimize(lambda trial: objective(trial, X, y),
    n_trials=100,
    n_jobs=20) # NGPUs on K8s cluster

    View full-size slide

  25. 26
    Example Notebook
    You can find all the code for this parallel HPO example in our deployment docs repo.
    https://docs.rapids.ai/deployment/stable
    /examples/xgboost-gpu-hpo-job-parallel-k
    8s/notebook/

    View full-size slide

  26. 29
    RAPIDS Community
    Join us
    OPEN SOURCE
    CONTRIBUTORS
    ADOPTERS

    View full-size slide

  27. 30
    How to Get Started with RAPIDS
    A Variety of Ways to Get Up & Running
    More about RAPIDS Self-Start Resources Discussion & Support
    ● Learn more at RAPIDS.ai
    ● Read the API docs
    ● Check out the RAPIDS blog
    ● Read the NVIDIA DevBlog
    ● Get started with RAPIDS
    ● Deploy on the Cloud today
    ● Start with Google Colab
    ● Look at the cheat sheets
    ● Check the RAPIDS GitHub
    ● Use the NVIDIA Forums
    ● Reach out on Slack
    ● Talk to NVIDIA Services
    @RAPIDSai https://github.com/rapidsai https://rapids-goai.slack.com/join https://rapids.ai
    Get Engaged

    View full-size slide

  28. 31
    https://docs.rapids.ai/deployment/stable/
    https://kubernetes.dask.org/en/latest/
    https://jacobtomlinson.dev/talks/
    31
    If you take a picture of any slide make it this one!

    View full-size slide

  29. THANK YOU
    Jacob Tomlinson
    [email protected]
    @_jacobtomlinson

    View full-size slide