The Art of Wrangling Your GPU Python Environment

1 The Art of Wrangling Your GPU Python Environments Melody
Wang, NVIDIA Intern Jacob Tomlinson, Senior Software Engineer PyData Global 2024

2 Introduction

3 Introductions Jacob Tomlinson Jacob Tomlinson is a senior software
engineer at NVIDIA. His work involves maintaining open source projects including RAPIDS and Dask. He also tinkers with kr8s in his spare time. He lives in Exeter, UK. Melody Wang Melody is an intern at NVIDIA on the RAPIDS Cloud Deployment Team. She is currently a senior studying Statistics & Machine Learning, CS, and Human-Computer Interaction at Carnegie Mellon University, She is super excited to be attending PyData and getting involved in the open source community!

4 RAPIDS https://github.com/rapidsai

5 Modern Enterprise Applications Need Accelerated Computing Internet scale data
| Massive models | Real-time performance LLMs Forecasting Fraud Detection Genomic Analysis Cybersecurity Single-threaded perf 1.5X per year 1.1X per year 102 103 104 105 106 107 101 ACCELERATED COMPUTING Recommendations

6 Accelerated Computing Swim Lanes RAPIDS makes accelerated computing more
seamless while enabling specialization for maximum performance

7 Accelerated pandas cudf.pandas: the zero code change GPU accelerator
for pandas built on cuDF G

8 cuML Accelerated machine learning with a scikit-learn API >>>
from sklearn.ensemble import RandomForestClassifier >>> clf = RandomForestClassifier() >>> clf.fit(x, y) >>> from cuml.ensemble import RandomForestClassifier >>> clf = RandomForestClassifier() >>> clf.fit(x, y) GPU CPU Scikit-learn cuML Time Series Preprocessing Classification Tree Models Cross Validation Clustering Explainability Dimensionality Reduction Regression 50+ GPU-Accelerated Algorithms A100 GPU vs. AMD EPYC 7642 (96 logical cores) cuML 23.04, scikit-learn 1.2.2, umap-learn 0.5.3

9 • 100x faster feature engineering • 20x faster model
training • Increased forecast accuracy RAPIDS | Dask | XGBoost • Processing relationships between 10 million biological entities through more than a billion edges. cuGraph • 70% Cost savings • 33% Performance improvement RAPIDS Accelerator for Apache Spark RAPIDS Adopted Across Industries

10 350+ RAPIDS contributors on GitHub Battle tested on the
most challenging workloads, integrated with the most innovative tools, and backed by a huge community 100+ Open-source and commercial software integrations 25% of Fortune 100 companies using RAPIDS Powering Modern Data Teams

11 The Status Quo of GPU Environments

12 12 Environmental Stack In many GPU environments, some layers
of the stack are predefined.

13 Where Things go Wrong Scenarios Incompatible NVIDIA Driver •
Installed: NVIDIA Driver 510. • Required: RAPIDS 23.10 with CUDA 12.1 requires NVIDIA Driver 525+. Multiple CUDA Versions Installed • Issue: CUDA 11.2 and CUDA 12.1 are both installed, leading to conflicts in dynamic library loading. • Fix : uninstall lower version of CUDA. Unsupported Hardware • Issue: The GPU (e.g., GTX 960M) does not support the required CUDA compute capability for RAPIDS (minimum 6.0 for most RAPIDS libraries). Improperly Configured Environment Variables • Issue: $LD_LIBRARY_PATH and $PATH point to an old CUDA installation (e.g., CUDA 10.2). • Fix: re-export environment variable to point to the new path.

14 What We’ve Tried

15 Virtual Packages • Represent system-level features (like CUDA) without
explicitly installing large system libraries via Conda. • Ensures RAPIDS libraries are compatible with the underlying GPU setup. • When Conda detects a GPU with a compatible CUDA version, it creates a virtual package (e.g., __cuda). • These virtual packages allow Conda to resolve dependencies without actually bundling the entire CUDA toolkit or drivers. __cuda, __glibc, __linux, __archspec, etc.

16 Conda Forge • Provides consistent builds across platforms and
architectures (Windows, macOS, Linux, ARM). • Ensures that dependencies between packages are correctly managed to avoid conflicts. • Uses a centralized dependency graph to coordinate version updates across packages.

17 On the Horizon…

18 Build Infrastructure ▪ PEP for index priority ▪ Arbitrary
metadata ▪ Shared C++ dependencies ▪ Pre-Installations in Google Colab

19 Introducing RAPIDS Doctor…

20 RAPIDS DOCTOR How bridges it all

21 DEMO

22 ✅ Healthy Environment ❌ Broken Environment

23 Design Highlights • Different types of checks ◦ System
Requirements & Recommendations ◦ GPU, CUDA Drivers, & OS • Diagnosis & Prescription • Library entrypoint plugins ◦ Cudf, cuML ◦ Morpheus ◦ etc

24 Design Highlights: System & Hardware Checks Required Recommended

25 Design Highlights: Diagnosis & Prescription RAPIDS DOCTOR Goes beyond
identifying problems by offering specific, actionable solutions

26 Design Highlights: Library Entrypoints Plugins RAPIDS DOCTOR RAPIDS DOCTOR
cuDF cuML Morpheus More to come.. Clean, modular, extendable design

27 Future Roadmap Platform checks ◦ Docker ◦ Kubernetes Integrated
checks with additional libraries Cloud Integrations ◦ Sagemaker ◦ Vertex ◦ Databricks, etc.

28 Q & A

29 Thank you! Learn more at https://rapids.ai

The Art of Wrangling Your GPU Python Environment

The Art of Wrangling Your GPU Python Environment

Melody Wang

More Decks by Melody Wang

Other Decks in Technology

Featured

Transcript

1 The Art of Wrangling Your GPU Python Environments Melody

2 Introduction

3 Introductions Jacob Tomlinson Jacob Tomlinson is a senior software

4 RAPIDS https://github.com/rapidsai

5 Modern Enterprise Applications Need Accelerated Computing Internet scale data

6 Accelerated Computing Swim Lanes RAPIDS makes accelerated computing more

7 Accelerated pandas cudf.pandas: the zero code change GPU accelerator

8 cuML Accelerated machine learning with a scikit-learn API >>>

9 • 100x faster feature engineering • 20x faster model

10 350+ RAPIDS contributors on GitHub Battle tested on the

11 The Status Quo of GPU Environments

12 12 Environmental Stack In many GPU environments, some layers

13 Where Things go Wrong Scenarios Incompatible NVIDIA Driver •

14 What We’ve Tried

15 Virtual Packages • Represent system-level features (like CUDA) without

16 Conda Forge • Provides consistent builds across platforms and

17 On the Horizon…

18 Build Infrastructure ▪ PEP for index priority ▪ Arbitrary

19 Introducing RAPIDS Doctor…

20 RAPIDS DOCTOR How bridges it all

21 DEMO

22 ✅ Healthy Environment ❌ Broken Environment

23 Design Highlights • Different types of checks ◦ System

24 Design Highlights: System & Hardware Checks Required Recommended

25 Design Highlights: Diagnosis & Prescription RAPIDS DOCTOR Goes beyond

26 Design Highlights: Library Entrypoints Plugins RAPIDS DOCTOR RAPIDS DOCTOR

27 Future Roadmap Platform checks ◦ Docker ◦ Kubernetes Integrated

28 Q & A

29 Thank you! Learn more at https://rapids.ai