What is RAPIDS?

Jacob Tomlinson

July 14, 2021

130

What is RAPIDS?

Presented at the Cyber Colombia HPC Summer School.

An overview of RAPIDS including cuDF, cuML, CuPy and Dask.

Jacob Tomlinson

July 14, 2021

Tweet

More Decks by Jacob Tomlinson

See All by Jacob Tomlinson

Tech Exeter - Intro to Kubernetes 10 Year Update

0

17

Who Builds the PyData Ecosystem?

0

31

The Art of Wrangling Your GPU Python Environments

0

43

Getting science done with accelerated Python computing platforms

0

37

Dask on HPC in 2024 - Lightning Talk

0

54

GPU Acceleration in the PyData community

0

50

Dask on HPC in 2024

0

26

GPU Acceleration in the PyData community

0

29

When to rebuild things that already exist

0

33

Other Decks in Technology

See All in Technology

ソフトウェアQAがハードウェアの人になったの

3

200

公開初日に Gemini CLI を試した話や FFmpeg と組み合わせてみた話など / Gemini CLI 初学者勉強会（#AI道場）

0

1.3k

Maintainer Meetupで「生の声」を聞く～講演だけじゃないKubeCon

0

110

Deep Security Conference 2025：生成AI時代のセキュリティ監視 /dsc2025-genai-secmon

4

2.9k

SRE不在の開発チームが障害対応と向き合った100日間 / 100 days dealing with issues without SREs

2

2k

TLSから見るSREの未来

2

310

AI時代にも変わらぬ価値を発揮したい: インフラ・クラウドを切り口にユーザー価値と非機能要件に向き合ってエンジニアとしての地力を培う

0

130

united airlines ™®️ USA Contact Numbers: Complete 2025 Support Guide

1

470

Four Keysから始める信頼性の改善 - SRE NEXT 2025

0

410

三視点LLMによる複数観点レビュー

0

230

安定した基盤システムのためのライブラリ選定

3

130

〜『世界中の家族のこころのインフラ』を目指して”次の10年”へ〜 SREが導いたグローバルサービスの信頼性向上戦略とその舞台裏 / Towards the Next Decade: Enhancing Global Service Reliability

3

1.5k

Featured

See All Featured

The Pragmatic Product Professional

35

6.7k

XXLCSS - How to scale CSS and keep your sanity

248

1.3M

Designing Dashboards & Data Visualisations in Web Apps

231

53k

How STYLIGHT went responsive

100

5.6k

29

5.4k

Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure

47

9.6k

個人開発の失敗を避けるイケてる考え方 / tips for indie hackers

108

19k

Typedesign – Prime Four

42

2.7k

Helping Users Find Their Own Way: Creating Modern Search Experiences

29

2.7k

How to Ace a Technical Interview

278

23k

The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024

26

2.9k

How to train your dragon (web standard)

96

6.1k

Transcript

Jacob Tomlinson Senior Software Engineer, RAPIDS Engineering Open GPU Data
Science
2 Jacob Tomlinson
3 What is RAPIDS?
4 RAPIDS https://github.com/rapidsai
5 25-100x Improvement Less Code Language Flexible Primarily In-Memory HDFS
Read HDFS Write HDFS Read HDFS Write HDFS Read Query ETL ML Train HDFS Read Query ETL ML Train HDFS Read GPU Read Query CPU Write GPU Read ETL CPU Write GPU Read ML Train 5-10x Improvement More Code Language Rigid Substantially on GPU Traditional GPU Processing Hadoop Processing, Reading from Disk Spark In-Memory Processing Data Processing Evolution Faster Data Access, Less Data Movement RAPIDS Arrow Read ETL ML Train Query 50-100x Improvement Same Code Language Flexible Primarily on GPU
6 Jake VanderPlas - PyCon 2017
7 Pandas Analytics CPU Memory Data Preparation Visualization Model Training
Scikit-Learn Machine Learning NetworkX Graph Analytics PyTorch, TensorFlow, MxNet Deep Learning Matplotlib Visualization Dask Open Source Data Science Ecosystem Familiar Python APIs
8 cuDF cuIO Analytics GPU Memory Data Preparation Visualization Model
Training cuML Machine Learning cuGraph Graph Analytics PyTorch, TensorFlow, MxNet Deep Learning cuxfilter, pyViz, plotly Visualization Dask RAPIDS End-to-End Accelerated GPU Data Science
9 OPEN SOURCE CONTRIBUTORS ADOPTERS Ecosystem Partners
10 Time in seconds (shorter is better) cuIO/cuDF (Load and
Data Prep) Data Conversion XGBoost Faster Speeds, Real World Benefits Faster Data Access, Less Data Movement cuIO/cuDF – Load and Data Preparation XGBoost Machine Learning End-to-End Benchmark 200GB CSV dataset; Data prep includes joins, variable transformations CPU Cluster Configuration CPU nodes (61 GiB memory, 8 vCPUs, 64-bit platform), Apache Spark RAPIDS Version RAPIDS 0.17 A100 Cluster Configuration 16 A100 GPUs (40GB each)
11 Technologies
12 cuDF cuIO Analytics GPU Memory Data Preparation Visualization Model
Training cuML Machine Learning cuGraph Graph Analytics PyTorch, TensorFlow, MxNet Deep Learning cuxfilter, pyViz, plotly Visualization Dask RAPIDS End-to-End Accelerated GPU Data Science
13 cuDF
14 ETL - the Backbone of Data Science PYTHON LIBRARY
▸ A Python library for manipulating GPU DataFrames following the Pandas API ▸ Python interface to CUDA C++ library with additional functionality ▸ Creating GPU DataFrames from Numpy arrays, Pandas DataFrames, and PyArrow Tables ▸ JIT compilation of User-Defined Functions (UDFs) using Numba cuDF is…
15 Benchmarks: Single-GPU Speedup vs. Pandas cuDF v0.13, Pandas 0.25.3
▸ Running on NVIDIA DGX-1: ▸ GPU: NVIDIA Tesla V100 32GB ▸ CPU: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz ▸ Benchmark Setup: ▸ RMM Pool Allocator Enabled ▸ DataFrames: 2x int32 columns key columns, 3x int32 value columns ▸ Merge: inner; GroupBy: count, sum, min, max calculated for each value column 300 900 500 0 Merge Sort GroupBy GPU Speedup Over CPU 10M 100M 970 500 370 350 330 320
16 Extraction is the Cornerstone cuIO for Faster Data Loading
▸ Follow Pandas APIs and provide >10x speedup ▸ Multiple supported formats, including: ▸ CSV Reader, CSV Writer ▸ Parquet Reader, Parquet Writer ▸ ORC Reader, ORC Writer ▸ JSON Reader ▸ Avro Reader ▸ GPU Direct Storage integration in progress for bypassing PCIe bottlenecks! ▸ Key is GPU-accelerating both parsing and decompression ▸ Benchmark: ▸ Dataset: NY Taxi dataset (Jan 2015) ▸ GPU: Single 32GB V100 ▸ RAPIDS Version: 0.17 N/A
17 CuPy
18
19 More details: https://blog.dask.org/2019/06/27/single-gpu-cupy-benchmarks Benchmark: Single-GPU CuPy vs NumPy 800
400 0 Elementwise GPU Speedup Over CPU Operation 800MB 8MB 150 270 5.3 210 3.6 190 5.1 150 8.3 66 18 11 1.5 17 1.1 3.5 FFT Array Slicing Stencil Sum Matrix Multiplication SVD Standard Deviation 100
20 SVD Benchmark Dask and CuPy Doing Complex Workflows
21 cuML
22 Decision Trees / Random Forests Linear/Lasso/Ridge/LARS/ElasticNet Regression Logistic Regression
K-Nearest Neighbors (exact or approximate) Support Vector Machine Classification and Regression Naive Bayes K-Means DBSCAN Spectral Clustering Principal Components (including iPCA) Singular Value Decomposition UMAP Spectral Embedding T-SNE Holt-Winters Seasonal ARIMA / Auto ARIMA More to come! Random Forest / GBDT Inference (FIL) Time Series Clustering Decomposition & Dimensionality Reduction Preprocessing Inference Classification / Regression Hyper-parameter Tuning Cross Validation Algorithms GPU-accelerated Scikit-Learn Text vectorization (TF-IDF / Count) Target Encoding Cross-validation / splitting
23 RAPIDS Matches Common Python APIs CPU-based Clustering from sklearn.datasets
import make_moons import pandas X, y = make_moons(n_samples=int(1e2), noise=0.05, random_state=0) X = pandas.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])}) from sklearn.cluster import DBSCAN dbscan = DBSCAN(eps = 0.3, min_samples = 5) y_hat = dbscan.fit_predict(X)
24 from sklearn.datasets import make_moons import cudf X, y =
make_moons(n_samples=int(1e2), noise=0.05, random_state=0) X = cudf.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])}) from cuml import DBSCAN dbscan = DBSCAN(eps = 0.3, min_samples = 5) y_hat = dbscan.fit_predict(X) RAPIDS Matches Common Python APIs GPU-accelerated Clustering
25 Benchmarks: Single-GPU cuML vs Scikit-learn 1x V100 vs. 2x
20 Core CPUs (DGX-1, RAPIDS 0.15)
26 Dask
27 cuDF cuIO Analytics GPU Memory Data Preparation Visualization Model
Training cuML Machine Learning cuGraph Graph Analytics PyTorch Chainer MxNet Deep Learning cuXfilter <> pyViz Visualization RAPIDS Scaling RAPIDS with Dask Dask
28 Why Dask? EASY SCALABILITY ▸ Easy to install and
use on a laptop ▸ Scales out to thousand node clusters ▸ Modularly built for acceleration DEPLOYABLE ▸ HPC: SLURM, PBS, LSF, SGE ▸ Cloud: Kubernetes ▸ Hadoop/Spark: Yarn PYDATA NATIVE ▸ Easy Migration: Built on top of NumPy, Pandas Scikit-Learn, etc ▸ Easy Training: With the same API POPULAR ▸ Most Common parallelism framework today in the PyData and SciPy community ▸ Millions of monthly Downloads and Dozens of Integrations NumPy, Pandas, Scikit-Learn, Numba and many more Single CPU core In-memory data PYDATA Multi-core and distributed PyData NumPy -> Dask Array Pandas -> Dask DataFrame Scikit-Learn -> Dask-ML … -> Dask Futures DASK Scale Out / Parallelize
29 Why Dask? Dask scales arrays, dataframes and ML APIs
30 Accelerated on single GPU NumPy -> CuPy/PyTorch/.. Pandas ->
cuDF Scikit-Learn -> cuML NetworkX -> cuGraph Numba -> Numba RAPIDS AND OTHERS NumPy, Pandas, Scikit-Learn, NetworkX, Numba and many more Single CPU core In-memory data PYDATA Scale Up / Accelerate Scale Up with RAPIDS
31 Accelerated on single GPU NumPy -> CuPy/PyTorch/.. Pandas ->
cuDF Scikit-Learn -> cuML NetworkX -> cuGraph Numba -> Numba RAPIDS AND OTHERS Multi-GPU On single Node (DGX) Or across a cluster RAPIDS + DASK WITH OPENUCX NumPy, Pandas, Scikit-Learn, Numba and many more Single CPU core In-memory data PYDATA Multi-core and distributed PyData NumPy -> Dask Array Pandas -> Dask DataFrame Scikit-Learn -> Dask-ML … -> Dask Futures DASK Scale Up / Accelerate Scale Out / Parallelize Scale Out with RAPIDS + Dask with OpenUCX
32 and so much more...
33 Even more RAPIDS libraries and ecosystem packages cuGraph ▸
Graph analytics ▸ Compatible with NetworkX, SciPy and CuPy cuSpatial ▸ Spatial Analytics ▸ Point-in-polygon and distance calculations cuSignal ▸ Signal processing NVTabular ▸ ETL library for recommender systems A Bigger, Better, Stronger Ecosystem for All CLX/cyBERT ▸ Cyber log acceleration ▸ Utilizes NLP and transformer architectures for cybersecurity tasks Data vizualization ▸ Cuxfilter and Plotly Dash ▸ Part of the pyViz community BlazingSQL ▸ GPU accelerated SQL engine built on top of RAPIDS Streamz ▸ Distributed stream processing
34 Interoperability for the Win mpi4py ▸ Real-world workflows often
need to share data between libraries ▸ RAPIDS supports device memory sharing between many popular data science and deep learning libraries ▸ Keeps data on the GPU--avoids costly copying back and forth to host memory ▸ Any library that supports DLPack or __cuda_array_interface__ will allow for sharing of memory buffers between RAPIDS and supported libraries
35 Exactly as it sounds—our goal is to make RAPIDS
as usable and performant as possible wherever data science is done. We will continue to work with more open source projects to further democratize acceleration and efficiency in data science. RAPIDS Everywhere The Next Phase of RAPIDS
36 Getting started
37 RAPIDS Docs https://docs.rapids.ai
38 Easy Installation Interactive Installation Guide
39 Integration with major cloud providers | Both containers and
cloud specific machine instances Support for Enterprise and HPC Orchestration Layers Cloud Dataproc Azure Machine Learning Deploy RAPIDS Everywhere Focused on Robust Functionality, Deployment, and User Experience
40 Integrations, feedback, documentation support, pull requests, new issues, or
code donations welcomed! APACHE ARROW GPU OPEN ANALYTICS INITIATIVE https://arrow.apache.org/ @ApacheArrow http://gpuopenanalytics.com/ @GPUOAI RAPIDS https://rapids.ai @RAPIDSai DASK https://dask.org @Dask_dev Join the Movement Everyone Can Help!
THANK YOU Jacob Tomlinson @_jacobtomlinson [email protected]