The Art of Wrangling Your GPU Python Environments

Slide 1

Slide 1 text

1 The Art of Wrangling Your GPU Python Environments Melody Wang, NVIDIA Intern Jacob Tomlinson, Senior Software Engineer PyData Global 2024

Slide 2

Slide 2 text

2 Introduction

Slide 3

Slide 3 text

3 Introductions Jacob Tomlinson Jacob Tomlinson is a senior software engineer at NVIDIA. His work involves maintaining open source projects including RAPIDS and Dask. He also tinkers with kr8s in his spare time. He lives in Exeter, UK. Melody Wang Melody is an intern at NVIDIA on the RAPIDS Cloud Deployment Team. She is currently a senior studying Statistics & Machine Learning, CS, and Human-Computer Interaction at Carnegie Mellon University, She is super excited to be attending PyData and getting involved in the open source community!

Slide 4

Slide 4 text

4 RAPIDS https://github.com/rapidsai

Slide 5

Slide 5 text

5 Modern Enterprise Applications Need Accelerated Computing Internet scale data | Massive models | Real-time performance LLMs Forecasting Fraud Detection Genomic Analysis Cybersecurity Single-threaded perf 1.5X per year 1.1X per year 102 103 104 105 106 107 101 ACCELERATED COMPUTING Recommendations

Slide 6

Slide 6 text

6 Accelerated Computing Swim Lanes RAPIDS makes accelerated computing more seamless while enabling specialization for maximum performance

Slide 7

Slide 7 text

7 Accelerated pandas cudf.pandas: the zero code change GPU accelerator for pandas built on cuDF G

Slide 8

Slide 8 text

8 cuML Accelerated machine learning with a scikit-learn API >>> from sklearn.ensemble import RandomForestClassifier >>> clf = RandomForestClassifier() >>> clf.fit(x, y) >>> from cuml.ensemble import RandomForestClassifier >>> clf = RandomForestClassifier() >>> clf.fit(x, y) GPU CPU Scikit-learn cuML Time Series Preprocessing Classification Tree Models Cross Validation Clustering Explainability Dimensionality Reduction Regression 50+ GPU-Accelerated Algorithms A100 GPU vs. AMD EPYC 7642 (96 logical cores) cuML 23.04, scikit-learn 1.2.2, umap-learn 0.5.3

Slide 9

Slide 9 text

9 • 100x faster feature engineering • 20x faster model training • Increased forecast accuracy RAPIDS | Dask | XGBoost • Processing relationships between 10 million biological entities through more than a billion edges. cuGraph • 70% Cost savings • 33% Performance improvement RAPIDS Accelerator for Apache Spark RAPIDS Adopted Across Industries

Slide 10

Slide 10 text

10 350+ RAPIDS contributors on GitHub Battle tested on the most challenging workloads, integrated with the most innovative tools, and backed by a huge community 100+ Open-source and commercial software integrations 25% of Fortune 100 companies using RAPIDS Powering Modern Data Teams

Slide 11

Slide 11 text

11 The Status Quo of GPU Environments

Slide 12

Slide 12 text

12 12 Environmental Stack In many GPU environments, some layers of the stack are predefined.

Slide 13

Slide 13 text

13 Where Things go Wrong Scenarios Incompatible NVIDIA Driver ● Installed: NVIDIA Driver 510. ● Required: RAPIDS 23.10 with CUDA 12.1 requires NVIDIA Driver 525+. Multiple CUDA Versions Installed ● Issue: CUDA 11.2 and CUDA 12.1 are both installed, leading to conflicts in dynamic library loading. ● Fix : uninstall lower version of CUDA. Unsupported Hardware ● Issue: The GPU (e.g., GTX 960M) does not support the required CUDA compute capability for RAPIDS (minimum 6.0 for most RAPIDS libraries). Improperly Configured Environment Variables ● Issue: $LD_LIBRARY_PATH and $PATH point to an old CUDA installation (e.g., CUDA 10.2). ● Fix: re-export environment variable to point to the new path.

Slide 14

Slide 14 text

14 What We’ve Tried

Slide 15

Slide 15 text

15 Virtual Packages • Represent system-level features (like CUDA) without explicitly installing large system libraries via Conda. • Ensures RAPIDS libraries are compatible with the underlying GPU setup. • When Conda detects a GPU with a compatible CUDA version, it creates a virtual package (e.g., __cuda). • These virtual packages allow Conda to resolve dependencies without actually bundling the entire CUDA toolkit or drivers. __cuda, __glibc, __linux, __archspec, etc.

Slide 16

Slide 16 text

16 Conda Forge • Provides consistent builds across platforms and architectures (Windows, macOS, Linux, ARM). • Ensures that dependencies between packages are correctly managed to avoid conflicts. • Uses a centralized dependency graph to coordinate version updates across packages.

Slide 17

Slide 17 text

17 On the Horizon…

Slide 18

Slide 18 text

18 Build Infrastructure ■ PEP for index priority ■ Arbitrary metadata ■ Shared C++ dependencies ■ Pre-Installations in Google Colab

Slide 19

Slide 19 text

19 Introducing RAPIDS Doctor…

Slide 20

Slide 20 text

20 RAPIDS DOCTOR How bridges it all

Slide 21

Slide 21 text

21 DEMO

Slide 22

Slide 22 text

22 ✅ Healthy Environment ❌ Broken Environment

Slide 23

Slide 23 text

23 Design Highlights ● Different types of checks ○ System Requirements & Recommendations ○ GPU, CUDA Drivers, & OS ● Diagnosis & Prescription ● Library entrypoint plugins ○ Cudf, cuML ○ Morpheus ○ etc

Slide 24

Slide 24 text

24 Design Highlights: System & Hardware Checks Required Recommended

Slide 25

Slide 25 text

25 Design Highlights: Diagnosis & Prescription RAPIDS DOCTOR Goes beyond identifying problems by offering specific, actionable solutions

Slide 26

Slide 26 text

26 Design Highlights: Library Entrypoints Plugins RAPIDS DOCTOR RAPIDS DOCTOR cuDF cuML Morpheus More to come.. Clean, modular, extendable design

Slide 27

Slide 27 text

27 Future Roadmap Platform checks ○ Docker ○ Kubernetes Integrated checks with additional libraries Cloud Integrations ○ Sagemaker ○ Vertex ○ Databricks, etc.