Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What is RAPIDS?

What is RAPIDS?

Presented at the Cyber Colombia HPC Summer School.

An overview of RAPIDS including cuDF, cuML, CuPy and Dask.

Jacob Tomlinson

July 14, 2021
Tweet

More Decks by Jacob Tomlinson

Other Decks in Technology

Transcript

  1. Jacob Tomlinson
    Senior Software Engineer, RAPIDS Engineering
    Open GPU Data Science

    View Slide

  2. 2
    Jacob
    Tomlinson

    View Slide

  3. 3
    What is RAPIDS?

    View Slide

  4. 4
    RAPIDS
    https://github.com/rapidsai

    View Slide

  5. 5
    25-100x Improvement
    Less Code
    Language Flexible
    Primarily In-Memory
    HDFS
    Read
    HDFS
    Write
    HDFS
    Read
    HDFS
    Write
    HDFS
    Read
    Query ETL ML Train
    HDFS
    Read
    Query ETL ML Train
    HDFS
    Read
    GPU
    Read
    Query
    CPU
    Write
    GPU
    Read
    ETL
    CPU
    Write
    GPU
    Read
    ML
    Train
    5-10x Improvement
    More Code
    Language Rigid
    Substantially on GPU
    Traditional GPU Processing
    Hadoop Processing, Reading from Disk
    Spark In-Memory Processing
    Data Processing Evolution
    Faster Data Access, Less Data Movement
    RAPIDS
    Arrow
    Read
    ETL
    ML
    Train
    Query
    50-100x Improvement
    Same Code
    Language Flexible
    Primarily on GPU

    View Slide

  6. 6
    Jake VanderPlas - PyCon 2017

    View Slide

  7. 7
    Pandas
    Analytics
    CPU Memory
    Data Preparation Visualization
    Model Training
    Scikit-Learn
    Machine Learning
    NetworkX
    Graph Analytics
    PyTorch,
    TensorFlow, MxNet
    Deep Learning
    Matplotlib
    Visualization
    Dask
    Open Source Data Science Ecosystem
    Familiar Python APIs

    View Slide

  8. 8
    cuDF cuIO
    Analytics
    GPU Memory
    Data Preparation Visualization
    Model Training
    cuML
    Machine Learning
    cuGraph
    Graph Analytics
    PyTorch,
    TensorFlow, MxNet
    Deep Learning
    cuxfilter, pyViz,
    plotly
    Visualization
    Dask
    RAPIDS
    End-to-End Accelerated GPU Data Science

    View Slide

  9. 9
    OPEN SOURCE
    CONTRIBUTORS
    ADOPTERS
    Ecosystem Partners

    View Slide

  10. 10
    Time in seconds (shorter is better)
    cuIO/cuDF (Load and Data Prep) Data Conversion XGBoost
    Faster Speeds, Real World Benefits
    Faster Data Access, Less Data Movement
    cuIO/cuDF –
    Load and Data Preparation XGBoost Machine Learning End-to-End
    Benchmark
    200GB CSV dataset; Data prep includes
    joins, variable transformations
    CPU Cluster Configuration
    CPU nodes (61 GiB memory, 8 vCPUs,
    64-bit platform), Apache Spark
    RAPIDS Version
    RAPIDS 0.17
    A100 Cluster Configuration
    16 A100 GPUs (40GB each)

    View Slide

  11. 11
    Technologies

    View Slide

  12. 12
    cuDF cuIO
    Analytics
    GPU Memory
    Data Preparation Visualization
    Model Training
    cuML
    Machine Learning
    cuGraph
    Graph Analytics
    PyTorch,
    TensorFlow, MxNet
    Deep Learning
    cuxfilter, pyViz,
    plotly
    Visualization
    Dask
    RAPIDS
    End-to-End Accelerated GPU Data Science

    View Slide

  13. 13
    cuDF

    View Slide

  14. 14
    ETL - the Backbone of Data Science
    PYTHON LIBRARY
    ▸ A Python library for manipulating GPU
    DataFrames following the Pandas API
    ▸ Python interface to CUDA C++ library with
    additional functionality
    ▸ Creating GPU DataFrames from Numpy arrays,
    Pandas DataFrames, and PyArrow Tables
    ▸ JIT compilation of User-Defined Functions
    (UDFs) using Numba
    cuDF is…

    View Slide

  15. 15
    Benchmarks: Single-GPU Speedup vs. Pandas
    cuDF v0.13, Pandas 0.25.3
    ▸ Running on NVIDIA DGX-1:
    ▸ GPU: NVIDIA Tesla V100 32GB
    ▸ CPU: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
    ▸ Benchmark Setup:
    ▸ RMM Pool Allocator Enabled
    ▸ DataFrames: 2x int32 columns key columns, 3x int32
    value columns
    ▸ Merge: inner; GroupBy: count, sum, min, max
    calculated for each value column
    300
    900
    500
    0
    Merge Sort GroupBy
    GPU Speedup Over CPU
    10M 100M
    970
    500
    370
    350
    330 320

    View Slide

  16. 16
    Extraction is the Cornerstone
    cuIO for Faster Data Loading
    ▸ Follow Pandas APIs and provide >10x speedup
    ▸ Multiple supported formats, including:
    ▸ CSV Reader, CSV Writer
    ▸ Parquet Reader, Parquet Writer
    ▸ ORC Reader, ORC Writer
    ▸ JSON Reader
    ▸ Avro Reader
    ▸ GPU Direct Storage integration in progress for bypassing
    PCIe bottlenecks!
    ▸ Key is GPU-accelerating both parsing and decompression
    ▸ Benchmark:
    ▸ Dataset: NY Taxi dataset (Jan 2015)
    ▸ GPU: Single 32GB V100
    ▸ RAPIDS Version: 0.17
    N/A

    View Slide

  17. 17
    CuPy

    View Slide

  18. 18

    View Slide

  19. 19
    More details: https://blog.dask.org/2019/06/27/single-gpu-cupy-benchmarks
    Benchmark: Single-GPU CuPy vs NumPy
    800
    400
    0
    Elementwise
    GPU Speedup Over CPU
    Operation
    800MB 8MB
    150
    270
    5.3
    210
    3.6
    190
    5.1
    150
    8.3
    66
    18
    11
    1.5
    17
    1.1
    3.5
    FFT Array
    Slicing
    Stencil Sum Matrix
    Multiplication
    SVD Standard
    Deviation
    100

    View Slide

  20. 20
    SVD Benchmark
    Dask and CuPy Doing Complex Workflows

    View Slide

  21. 21
    cuML

    View Slide

  22. 22
    Decision Trees / Random Forests
    Linear/Lasso/Ridge/LARS/ElasticNet Regression
    Logistic Regression
    K-Nearest Neighbors (exact or approximate)
    Support Vector Machine Classification and
    Regression
    Naive Bayes
    K-Means
    DBSCAN
    Spectral Clustering
    Principal Components (including iPCA)
    Singular Value Decomposition
    UMAP
    Spectral Embedding
    T-SNE
    Holt-Winters
    Seasonal ARIMA / Auto ARIMA
    More to come!
    Random Forest / GBDT Inference (FIL)
    Time Series
    Clustering
    Decomposition &
    Dimensionality Reduction
    Preprocessing
    Inference
    Classification / Regression
    Hyper-parameter Tuning
    Cross Validation
    Algorithms
    GPU-accelerated Scikit-Learn
    Text vectorization (TF-IDF / Count)
    Target Encoding
    Cross-validation / splitting

    View Slide

  23. 23
    RAPIDS Matches Common Python APIs
    CPU-based Clustering
    from sklearn.datasets import make_moons
    import pandas
    X, y = make_moons(n_samples=int(1e2),
    noise=0.05, random_state=0)
    X = pandas.DataFrame({'fea%d'%i: X[:, i]
    for i in range(X.shape[1])})
    from sklearn.cluster import DBSCAN
    dbscan = DBSCAN(eps = 0.3, min_samples = 5)
    y_hat = dbscan.fit_predict(X)

    View Slide

  24. 24
    from sklearn.datasets import make_moons
    import cudf
    X, y = make_moons(n_samples=int(1e2),
    noise=0.05, random_state=0)
    X = cudf.DataFrame({'fea%d'%i: X[:, i]
    for i in range(X.shape[1])})
    from cuml import DBSCAN
    dbscan = DBSCAN(eps = 0.3, min_samples = 5)
    y_hat = dbscan.fit_predict(X)
    RAPIDS Matches Common Python APIs
    GPU-accelerated Clustering

    View Slide

  25. 25
    Benchmarks: Single-GPU cuML vs Scikit-learn
    1x V100 vs. 2x 20 Core CPUs (DGX-1, RAPIDS 0.15)

    View Slide

  26. 26
    Dask

    View Slide

  27. 27
    cuDF cuIO
    Analytics
    GPU Memory
    Data Preparation Visualization
    Model Training
    cuML
    Machine Learning
    cuGraph
    Graph Analytics
    PyTorch Chainer MxNet
    Deep Learning
    cuXfilter <> pyViz
    Visualization
    RAPIDS
    Scaling RAPIDS with Dask
    Dask

    View Slide

  28. 28
    Why Dask?
    EASY SCALABILITY
    ▸ Easy to install and use on a laptop
    ▸ Scales out to thousand node clusters
    ▸ Modularly built for acceleration
    DEPLOYABLE
    ▸ HPC: SLURM, PBS, LSF, SGE
    ▸ Cloud: Kubernetes
    ▸ Hadoop/Spark: Yarn
    PYDATA NATIVE
    ▸ Easy Migration: Built on top of NumPy,
    Pandas Scikit-Learn, etc
    ▸ Easy Training: With the same API
    POPULAR
    ▸ Most Common parallelism framework today in the
    PyData and SciPy community
    ▸ Millions of monthly Downloads and Dozens of
    Integrations
    NumPy, Pandas, Scikit-Learn,
    Numba and many more
    Single CPU core
    In-memory data
    PYDATA
    Multi-core and distributed PyData
    NumPy -> Dask Array
    Pandas -> Dask DataFrame
    Scikit-Learn -> Dask-ML
    … -> Dask Futures
    DASK
    Scale Out / Parallelize

    View Slide

  29. 29
    Why Dask?
    Dask scales arrays, dataframes and ML APIs

    View Slide

  30. 30
    Accelerated on single GPU
    NumPy -> CuPy/PyTorch/..
    Pandas -> cuDF
    Scikit-Learn -> cuML
    NetworkX -> cuGraph
    Numba -> Numba
    RAPIDS AND OTHERS
    NumPy, Pandas, Scikit-Learn,
    NetworkX, Numba and many
    more
    Single CPU core
    In-memory data
    PYDATA
    Scale Up / Accelerate
    Scale Up with RAPIDS

    View Slide

  31. 31
    Accelerated on single GPU
    NumPy -> CuPy/PyTorch/..
    Pandas -> cuDF
    Scikit-Learn -> cuML
    NetworkX -> cuGraph
    Numba -> Numba
    RAPIDS AND OTHERS
    Multi-GPU
    On single Node (DGX)
    Or across a cluster
    RAPIDS + DASK
    WITH OPENUCX
    NumPy, Pandas, Scikit-Learn,
    Numba and many more
    Single CPU core
    In-memory data
    PYDATA
    Multi-core and distributed PyData
    NumPy -> Dask Array
    Pandas -> Dask DataFrame
    Scikit-Learn -> Dask-ML
    … -> Dask Futures
    DASK
    Scale Up / Accelerate
    Scale Out / Parallelize
    Scale Out with RAPIDS + Dask with OpenUCX

    View Slide

  32. 32
    and so much more...

    View Slide

  33. 33
    Even more RAPIDS libraries and ecosystem packages
    cuGraph
    ▸ Graph analytics
    ▸ Compatible with NetworkX, SciPy and CuPy
    cuSpatial
    ▸ Spatial Analytics
    ▸ Point-in-polygon and distance calculations
    cuSignal
    ▸ Signal processing
    NVTabular
    ▸ ETL library for recommender systems
    A Bigger, Better, Stronger Ecosystem for All
    CLX/cyBERT
    ▸ Cyber log acceleration
    ▸ Utilizes NLP and transformer
    architectures for cybersecurity tasks
    Data vizualization
    ▸ Cuxfilter and Plotly Dash
    ▸ Part of the pyViz community
    BlazingSQL
    ▸ GPU accelerated SQL engine built on
    top of RAPIDS
    Streamz
    ▸ Distributed stream processing

    View Slide

  34. 34
    Interoperability for the Win
    mpi4py
    ▸ Real-world workflows often need to share data between
    libraries
    ▸ RAPIDS supports device memory sharing between many
    popular data science and deep learning libraries
    ▸ Keeps data on the GPU--avoids costly copying back and
    forth to host memory
    ▸ Any library that supports DLPack or
    __cuda_array_interface__ will allow for sharing of
    memory buffers between RAPIDS and supported libraries

    View Slide

  35. 35
    Exactly as it sounds—our goal is to make
    RAPIDS as usable and performant as
    possible wherever data science is done.
    We will continue to work with more open
    source projects to further democratize
    acceleration and efficiency in data
    science.
    RAPIDS Everywhere
    The Next Phase of RAPIDS

    View Slide

  36. 36
    Getting started

    View Slide

  37. 37
    RAPIDS Docs
    https://docs.rapids.ai

    View Slide

  38. 38
    Easy Installation
    Interactive Installation Guide

    View Slide

  39. 39
    Integration with major cloud providers | Both containers and cloud specific machine instances
    Support for Enterprise and HPC Orchestration Layers
    Cloud
    Dataproc
    Azure Machine
    Learning
    Deploy RAPIDS Everywhere
    Focused on Robust Functionality, Deployment, and User Experience

    View Slide

  40. 40
    Integrations, feedback, documentation support, pull requests, new issues, or code donations welcomed!
    APACHE ARROW
    GPU OPEN ANALYTICS
    INITIATIVE
    https://arrow.apache.org/
    @ApacheArrow
    http://gpuopenanalytics.com/
    @GPUOAI
    RAPIDS
    https://rapids.ai
    @RAPIDSai
    DASK
    https://dask.org
    @Dask_dev
    Join the Movement
    Everyone Can Help!

    View Slide

  41. THANK YOU
    Jacob Tomlinson @_jacobtomlinson
    [email protected]

    View Slide