Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GPU Acceleration in the PyData community

GPU Acceleration in the PyData community

Jacob Tomlinson

November 12, 2024
Tweet

More Decks by Jacob Tomlinson

Other Decks in Technology

Transcript

  1. 1 GPU Acceleration in the PyData community Jacob Tomlinson, Dask

    Maintainer and RAPIDS Developer Pangeo CNES 2024
  2. 2 Modern Applications Need Accelerated Computing Petabyte scale data |

    Massive models | Real-time performance LLMs Forecasting Fraud Detection Genomic Analysis Cybersecurity Single-threaded perf 1.5X per year 1.1X per year 102 103 104 105 106 107 101 ACCELERATED COMPUTING Recommenders
  3. 3 Accelerated Computing Swim Lanes RAPIDS makes accelerated computing more

    seamless while enabling specialization for maximum performance
  4. 4 100x faster feature engineering 20x faster model training Increased

    forecast accuracy RAPIDS | Dask | XGBoost Processing relationships between 10 million biological entities through more than a billion edges. cuGraph 70% Cost savings 33% Performance improvement RAPIDS Accelerator for Apache Spark RAPIDS Adopted Across Industries
  5. Bringing NVIDIA accelerated computing to Polars Polars GPU Engine Powered

    by RAPIDS cuDF https://developer.nvidia.com/blog/polars-gpu-engine-powered-by-rapids-cudf-now-available-in-open-beta/
  6. 7 Accelerated Dask Just set “cudf” and “cupy” as the

    backend and use Dask-CUDA Workers • Configurable Backend and GPU-Aware Workers • Memory Spilling (GPU->CPU->Disk) • Optimized Memory Management • Accelerated RDMA and Networking (UCX) • Community tools like xarray-cupy
  7. 8 Accelerated Apache Spark Zero code change acceleration for Spark

    DataFrames and SQL spark.sql(""" select order count(*) as order_count from orders""" ) spark.conf.set("spark.plugins", "com.nvidia.spark.SQLPlugin") spark.sql(""" select order count(*) as order_count from orders""" ) CPU Spark GPU Spark Average Speed-Ups: >5x • Operates as a software plugin to popular Apache Spark platform • Automatically accelerates supported operations (with CPU fallback if needed) • Requires no code changes • Works with Spark standalone, YARN clusters, Kubernetes clusters • Deploy on: Apache Spark 3.4.1, RAPIDS Spark release 24.04 See GTC session S62257 for details NVIDIA Decision Support Benchmark 3TB (Public Cloud) Amazon EMR Google Cloud Dataproc
  8. 9 cuML Accelerated machine learning with a scikit-learn API >>>

    from sklearn.ensemble import RandomForestClassifier >>> clf = RandomForestClassifier() >>> clf.fit(x, y) >>> from cuml.ensemble import RandomForestClassifier >>> clf = RandomForestClassifier() >>> clf.fit(x, y) GPU CPU Scikit-learn cuML Time Series Preprocessing Classification Tree Models Cross Validation Clustering Explainability Dimensionality Reduction Regression 50+ GPU-Accelerated Algorithms A100 GPU vs. AMD EPYC 7642 (96 logical cores) cuML 23.04, scikit-learn 1.2.2, umap-learn 0.5.3
  9. 10 Accelerated NetworkX nx-cugraph: the zero-code change GPU backend for

    NetworkX • Zero-code-change GPU-acceleration of for NetworkX code • Accelerates algorithms up to 600x, based on algorithm and graph size • Support for 60 popular graph algorithms and growing • Falls back to using CPU NetworkX for unsupported algorithms NetworkX 3.2, CPU: Intel(R) Xeon(R) Platinum 8480CL 2TB, GPU: NVIDIA H100 80GB pip install nx-cugraph-cu12 --extra-index-url https://pypi.nvidia.com conda install -c rapidsai -c conda-forge -c nvidia nx-cugraph