Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Art of Wrangling Your GPU Python Environment

The Art of Wrangling Your GPU Python Environment

Abstract

Debugging software itself is a hard task, but debugging GPU software environments can be even more challenging. Understanding the intricate interactions between hardware, drivers, CUDA, C++ dependencies, and Python libraries can be far more complex.

In this talk we will dig into how these different layers interact and how you can address some of the common pitfalls that folks run into when configuring GPU Python environments. We will also introduce a new tool, RAPIDS Doctor, that aims to take the challenge out of ensuring your software environments are in good shape. RAPIDS Doctor checks and diagnoses environmental health issues straight from the command line, ensuring that your setup is fully functional and optimized for performance.

Description

Projects like RAPIDS, a rapidly growing suite of GPU Accelerated ML & Data Science libraries, along with communities like Pytorch, Tensorflow and others are continuously looking to simplify the setup required to leverage GPUs in your PyData workflows.

Many users seek to install and use RAPIDS but are unclear of certain system requirements that it depends on. To install RAPIDS you generally need a GPU, NVIDIA Drivers, CUDA Toolkit, and RAPIDS packages (and compatible dependencies). While most of the software can be installed via conda/pip, the drivers must be installed outside of your Python environment and consistent with GPU requirements.

RAPIDS Doctor is a new command line tool that will have capabilities to check for multiple system dependencies. As users are frequently installing RAPIDS in a variety of cloud environments, this is particularly useful in getting a quick rundown on incompatibilities that may cause issues down the line. Additionally, RAPIDS Doctor also prescribes a treatment for diagnosed issues, such as quick fix suggestions in your terminal or even autofixes.

In this talk we will demonstrate the suite of use-cases of Rapids Doctor and the diversity of health checks that it has expertise in. Whether you're a seasoned developer or just starting with Python software development with GPUs, this tool streamlines the setup process and enhances your productivity, allowing you to focus on your data science and machine learning projects without the headaches of environmental troubleshooting.

Melody Wang

December 04, 2024
Tweet

More Decks by Melody Wang

Other Decks in Technology

Transcript

  1. 1 The Art of Wrangling Your GPU Python Environments Melody

    Wang, NVIDIA Intern Jacob Tomlinson, Senior Software Engineer PyData Global 2024
  2. 3 Introductions Jacob Tomlinson Jacob Tomlinson is a senior software

    engineer at NVIDIA. His work involves maintaining open source projects including RAPIDS and Dask. He also tinkers with kr8s in his spare time. He lives in Exeter, UK. Melody Wang Melody is an intern at NVIDIA on the RAPIDS Cloud Deployment Team. She is currently a senior studying Statistics & Machine Learning, CS, and Human-Computer Interaction at Carnegie Mellon University, She is super excited to be attending PyData and getting involved in the open source community!
  3. 5 Modern Enterprise Applications Need Accelerated Computing Internet scale data

    | Massive models | Real-time performance LLMs Forecasting Fraud Detection Genomic Analysis Cybersecurity Single-threaded perf 1.5X per year 1.1X per year 102 103 104 105 106 107 101 ACCELERATED COMPUTING Recommendations
  4. 6 Accelerated Computing Swim Lanes RAPIDS makes accelerated computing more

    seamless while enabling specialization for maximum performance
  5. 8 cuML Accelerated machine learning with a scikit-learn API >>>

    from sklearn.ensemble import RandomForestClassifier >>> clf = RandomForestClassifier() >>> clf.fit(x, y) >>> from cuml.ensemble import RandomForestClassifier >>> clf = RandomForestClassifier() >>> clf.fit(x, y) GPU CPU Scikit-learn cuML Time Series Preprocessing Classification Tree Models Cross Validation Clustering Explainability Dimensionality Reduction Regression 50+ GPU-Accelerated Algorithms A100 GPU vs. AMD EPYC 7642 (96 logical cores) cuML 23.04, scikit-learn 1.2.2, umap-learn 0.5.3
  6. 9 • 100x faster feature engineering • 20x faster model

    training • Increased forecast accuracy RAPIDS | Dask | XGBoost • Processing relationships between 10 million biological entities through more than a billion edges. cuGraph • 70% Cost savings • 33% Performance improvement RAPIDS Accelerator for Apache Spark RAPIDS Adopted Across Industries
  7. 10 350+ RAPIDS contributors on GitHub Battle tested on the

    most challenging workloads, integrated with the most innovative tools, and backed by a huge community 100+ Open-source and commercial software integrations 25% of Fortune 100 companies using RAPIDS Powering Modern Data Teams
  8. 13 Where Things go Wrong Scenarios Incompatible NVIDIA Driver •

    Installed: NVIDIA Driver 510. • Required: RAPIDS 23.10 with CUDA 12.1 requires NVIDIA Driver 525+. Multiple CUDA Versions Installed • Issue: CUDA 11.2 and CUDA 12.1 are both installed, leading to conflicts in dynamic library loading. • Fix : uninstall lower version of CUDA. Unsupported Hardware • Issue: The GPU (e.g., GTX 960M) does not support the required CUDA compute capability for RAPIDS (minimum 6.0 for most RAPIDS libraries). Improperly Configured Environment Variables • Issue: $LD_LIBRARY_PATH and $PATH point to an old CUDA installation (e.g., CUDA 10.2). • Fix: re-export environment variable to point to the new path.
  9. 15 Virtual Packages • Represent system-level features (like CUDA) without

    explicitly installing large system libraries via Conda. • Ensures RAPIDS libraries are compatible with the underlying GPU setup. • When Conda detects a GPU with a compatible CUDA version, it creates a virtual package (e.g., __cuda). • These virtual packages allow Conda to resolve dependencies without actually bundling the entire CUDA toolkit or drivers. __cuda, __glibc, __linux, __archspec, etc.
  10. 16 Conda Forge • Provides consistent builds across platforms and

    architectures (Windows, macOS, Linux, ARM). • Ensures that dependencies between packages are correctly managed to avoid conflicts. • Uses a centralized dependency graph to coordinate version updates across packages.
  11. 18 Build Infrastructure ▪ PEP for index priority ▪ Arbitrary

    metadata ▪ Shared C++ dependencies ▪ Pre-Installations in Google Colab
  12. 23 Design Highlights • Different types of checks ◦ System

    Requirements & Recommendations ◦ GPU, CUDA Drivers, & OS • Diagnosis & Prescription • Library entrypoint plugins ◦ Cudf, cuML ◦ Morpheus ◦ etc
  13. 25 Design Highlights: Diagnosis & Prescription RAPIDS DOCTOR Goes beyond

    identifying problems by offering specific, actionable solutions
  14. 26 Design Highlights: Library Entrypoints Plugins RAPIDS DOCTOR RAPIDS DOCTOR

    cuDF cuML Morpheus More to come.. Clean, modular, extendable design
  15. 27 Future Roadmap Platform checks ◦ Docker ◦ Kubernetes Integrated

    checks with additional libraries Cloud Integrations ◦ Sagemaker ◦ Vertex ◦ Databricks, etc.