$30 off During Our Annual Pro Sale. View Details »

High-Performance Data Science at Scale with RAP...

High-Performance Data Science at Scale with RAPIDS, Dask, and GPUs

As the community of data science engineers approaches problems of increasing volume, there is a prescient concern over the timeliness of their solutions. Python has become the lingua franca for constructing simple case studies that communicate domain-specific intuition; therein, codifying a procedure to (1) build a model that apparently works on a small subset of data, (2) use conventional methods to scale that solution to a large cluster of variable size, (3) realize that the subset wasn't representative, requiring that (4) a new model be used, back to (1), and on it repeats until satisfaction is achieved. This procedure standardizes missteps and friction, whilst instilling within the community the notion that Python is not performant enough to address the great many problems ahead.

Enter RAPIDS, a platform for accelerating integrated data science. By binding efficient low-level implementations in CUDA C/C++ to Python, and by using Dask's elastic scaling model, a data scientist may now employ a two-step procedure that is many times faster than conventional methods: (1) construct a prototypical solution based on a small subset of data, and (2) deploy the same code on a large cluster of variable size, repeating until the right features are engineered.

RAPIDS is a collection of open-source libraries fostered by Nvidia, and based on years of accelerated analytics experience. RAPIDS leverages low-level implementations in CUDA, optimizing for massive parallelism, high-bandwidth, and maintaining a focused user-friendly Python interface. Chiefly, we concern ourselves with API parity with respect to Pandas and Scikit-Learn; meaning, a data scientist that knows Pandas and Scikit-Learn will have an easy time getting up to speed with RAPIDS. RAPIDS maintains and contributes to many libraries, including cuDF, a GPU DataFrame library with Pandas parity; cuML, a GPU machine learning library with Scikit-Learn parity; cuGRAPH, a GPU graph library with NetworkX parity; cuXFilter, a GPU cross-filter library, a browser-based cross-filtering solution for visualizing feature data in-memory; and Dask-cuDF, a library for distributed CUDA DataFrame objects. RAPIDS also contributes to libraries for elastic compute and machine learning: Dask, XGBoost, with many more to come.

By accelerating the entire ecosystem with CUDA, RAPIDS benefits from incredible acceleration over state-of-the-art CPU implementations and conventional methods. Even better, RAPIDS is committed to the community with its API-parity approach, and with its Apache Arrow compliance. This eliminates inefficient glue-code, and makes it easier for the RAPIDS ecosystem to interoperate with other external libraries.

Avatar for Keith Kraus

Keith Kraus

November 05, 2019
Tweet

More Decks by Keith Kraus

Other Decks in Programming

Transcript

  1. 2 Data Processing Evolution Faster data access, less data movement

    HDFS Read HDFS Write HDFS Read HDFS Write HDFS Read Query ETL ML Train Hadoop Processing, Reading from disk
  2. 3 Data Processing Evolution Faster data access, less data movement

    25-100x Improvement Less code Language flexible Primarily In-Memory HDFS Read HDFS Write HDFS Read HDFS Write HDFS Read Query ETL ML Train HDFS Read Query ETL ML Train Hadoop Processing, Reading from disk Spark In-Memory Processing
  3. 4 Spark is not Enough Basic workloads are bottlenecked by

    the CPU • In a simple benchmark consisting of aggregating data, the CPU is the bottleneck • This is after the data is parsed and cached into memory which is another common bottleneck • The CPU bottleneck is even worse in more complex workloads! SELECT cab_type, count(*) FROM trips_orc GROUP BY cab_type; Source: Mark Litwintschik’s blog: 1.1 Billion Taxi Rides: EC2 versus EMR
  4. 5 Data Processing Evolution Faster data access, less data movement

    25-100x Improvement Less code Language flexible Primarily In-Memory HDFS Read HDFS Write HDFS Read HDFS Write HDFS Read Query ETL ML Train HDFS Read Query ETL ML Train HDFS Read GPU Read Query CPU Write GPU Read ETL CPU Write GPU Read ML Train 5-10x Improvement More code Language rigid Substantially on GPU Traditional GPU Processing Hadoop Processing, Reading from disk Spark In-Memory Processing
  5. 6 Why GPUs? Numerous hardware advantages • Thousands of cores

    with up to ~15 TeraFlops of general purpose compute performance • Up to 1 TB/s of memory bandwidth • Hardware interconnects for up to 300 GB/s bidirectional GPU <--> GPU bandwidth • Can scale up to 16x GPUs in a single node Almost never run out of compute relative to memory bandwidth!
  6. 7 APP A Data Movement and Transformation Data Movement and

    Transformation The bane of productivity and performance CPU APP B Copy & Convert Copy & Convert Copy & Convert APP A GPU Data APP B GPU Data Read Data Load Data APP B APP A GPU
  7. 8 Data Movement and Transformation Data Movement and Transformation What

    if we could keep data on the GPU? APP A APP B Copy & Convert Copy & Convert Copy & Convert Read Data Load Data CPU APP A GPU Data APP B GPU Data APP B APP A GPU Copy & Convert
  8. 9 Learning from Apache Arrow From Apache Arrow Home Page

    - https://arrow.apache.org/ • Each system has its own internal memory format • 70-80% computation wasted on serialization and deserialization • Similar functionality implemented in multiple projects • All systems utilize the same memory format • No overhead for cross-system communication • Projects can share functionality (eg, Parquet-to-Arrow reader)
  9. 10 Data Processing Evolution Faster data access, less data movement

    25-100x Improvement Less code Language flexible Primarily In-Memory HDFS Read HDFS Write HDFS Read HDFS Write HDFS Read Query ETL ML Train HDFS Read Query ETL ML Train HDFS Read GPU Read Query CPU Write GPU Read ETL CPU Write GPU Read ML Train Arrow Read ETL ML Train 5-10x Improvement More code Language rigid Substantially on GPU 50-100x Improvement Same code Language flexible Primarily on GPU RAPIDS Traditional GPU Processing Hadoop Processing, Reading from disk Spark In-Memory Processing Query
  10. 11 Faster Speeds, Real-World Benefits cuIO/cuDF – Load and Data

    Preparation XGBoost Machine Learning Time in seconds (shorter is better) cuIO/cuDF (Load and Data Prep) Data Conversion XGBoost Benchmark 200GB CSV dataset; Data prep includes joins, variable transformations CPU Cluster Configuration CPU nodes (61 GiB memory, 8 vCPUs, 64-bit platform), Apache Spark DGX Cluster Configuration 5x DGX-1 on InfiniBand network 8762 6148 3925 3221 322 213 End-to-End
  11. 12 Faster Speeds, Real-World Benefits cuIO/cuDF – Load and Data

    Preparation XGBoost Machine Learning Time in seconds (shorter is better) cuIO/cuDF (Load and Data Prep) Data Conversion XGBoost Benchmark 200GB CSV dataset; Data prep includes joins, variable transformations CPU Cluster Configuration CPU nodes (61 GiB memory, 8 vCPUs, 64-bit platform), Apache Spark DGX Cluster Configuration 5x DGX-1 on InfiniBand network End-to-End Improving Over Time
  12. 15 Pandas Analytics CPU Memory Data Preparation Visualization Model Training

    Scikit-Learn Machine Learning NetworkX Graph Analytics PyTorch Chainer MxNet Deep Learning Matplotlib/Seaborn Visualization Open Source Data Science Ecosystem Familiar Python APIs Dask
  13. 16 cuDF cuIO Analytics GPU Memory Data Preparation Visualization Model

    Training cuML Machine Learning cuGraph Graph Analytics PyTorch Chainer MxNet Deep Learning cuXfilter <> pyViz Visualization RAPIDS End-to-End Accelerated GPU Data Science Dask
  14. 18 cuDF cuIO Analytics GPU Memory Data Preparation Visualization Model

    Training cuML Machine Learning cuGraph Graph Analytics PyTorch Chainer MxNet Deep Learning cuXfilter <> pyViz Visualization RAPIDS Scaling RAPIDS with Dask Dask
  15. 19 Why Dask? • Easy Migration: Built on top of

    NumPy, Pandas Scikit-Learn, etc. • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop • Scales out to thousand-node clusters Easy Scalability • Most common parallelism framework today in the PyData and SciPy community Popular • HPC: SLURM, PBS, LSF, SGE • Cloud: Kubernetes • Hadoop/Spark: Yarn Deployable
  16. 20 Why OpenUCX? • TCP sockets are slow! • UCX

    provides uniform access to transports (TCP, InfiniBand, shared memory, NVLink) • Alpha Python bindings for UCX (ucx-py) https://github.com/rapidsai/ucx-py • Will provide best communication performance, to Dask based on available hardware on nodes/cluster Bringing hardware accelerated communications to Dask
  17. 21 Scale up with RAPIDS Accelerated on single GPU NumPy

    -> CuPy/PyTorch/.. Pandas -> cuDF Scikit-Learn -> cuML Numba -> Numba RAPIDS and Others NumPy, Pandas, Scikit-Learn, Numba and many more Single CPU core In-memory data PyData Scale Up / Accelerate
  18. 22 Scale out with RAPIDS + Dask with OpenUCX Accelerated

    on single GPU NumPy -> CuPy/PyTorch/.. Pandas -> cuDF Scikit-Learn -> cuML Numba -> Numba RAPIDS and Others Multi-GPU On single Node (DGX) Or across a cluster RAPIDS + Dask with OpenUCX Scale Up / Accelerate Scale out / Parallelize NumPy, Pandas, Scikit-Learn, Numba and many more Single CPU core In-memory data PyData Multi-core and Distributed PyData NumPy -> Dask Array Pandas -> Dask DataFrame Scikit-Learn -> Dask-ML … -> Dask Futures Dask
  19. 24 cuDF cuIO Analytics GPU Memory Data Preparation Visualization Model

    Training cuML Machine Learning cuGraph Graph Analytics PyTorch Chainer MxNet Deep Learning cuXfilter <> pyViz Visualization RAPIDS GPU Accelerated data wrangling and feature engineering Dask
  20. 25 GPU-Accelerated ETL The average data scientist spends 90+% of

    their time in ETL as opposed to training models
  21. 26 ETL Technology Stack Dask cuDF cuDF Pandas Thrust Cub

    Jitify Python Cython cuDF C++ CUDA Libraries CUDA
  22. 27 ETL - the Backbone of Data Science libcuDF is…

    CUDA C++ Library • Low level library containing function implementations and C/C++ API • Importing/exporting Apache Arrow in GPU memory using CUDA IPC • CUDA kernels to perform element-wise math operations on GPU DataFrame columns • CUDA sort, join, groupby, reduction, etc. operations on GPU DataFrames
  23. 28 ETL - the Backbone of Data Science cuDF is…

    Python Library • A Python library for manipulating GPU DataFrames following the Pandas API • Python interface to CUDA C++ library with additional functionality • Creating GPU DataFrames from Numpy arrays, Pandas DataFrames, and PyArrow Tables • JIT compilation of User-Defined Functions (UDFs) using Numba
  24. 29 cuDF v0.10, Pandas 0.24.2 Running on NVIDIA DGX-1: GPU:

    NVIDIA Tesla V100 32GB CPU: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz Benchmark Setup: DataFrames: 2x int32 columns key columns, 3x int32 value columns Merge: inner GroupBy: count, sum, min, max calculated for each value column Benchmarks: single-GPU Speedup vs. Pandas
  25. 30 cuDF cuIO Analytics GPU Memory Data Preparation Visualization Model

    Training cuML Machine Learning cuGraph Graph Analytics PyTorch Chainer MxNet Deep Learning cuXfilter <> pyViz Visualization Dask ETL - the Backbone of Data Science cuDF is not the end of the story
  26. 31 ETL - the Backbone of Data Science String Support

    •Regular Expressions •Element-wise operations • Split, Find, Extract, Cat, Typecasting, etc… •String GroupBys, Joins •Categorical columns fully on GPU •cuStrings repo merged into cuDF repo Current v0.10 String Support • Native string columns in libcudf • Extensive performance optimization • More Pandas String API compatibility • JIT-compiled String UDFs Future v0.11+ String Support
  27. 32 • Follow Pandas APIs and provide >10x speedup •

    CSV Reader - v0.2, CSV Writer v0.8 • Parquet Reader – v0.7, Parquet Writer v0.11 • ORC Reader – v0.7, ORC Writer v0.10 • JSON Reader - v0.8 • Avro Reader - v0.9 • GPU Direct Storage integration in progress for bypassing PCIe bottlenecks! • Key is GPU-accelerating both parsing and decompression wherever possible Extraction is the Cornerstone cuIO for Faster Data Loading
  28. 34 GPU Memory Data Preparation Visualization Model Training RAPIDS Building

    bridges into the array ecosystem Dask cuDF cuIO Analytics cuML Machine Learning cuGraph Graph Analytics PyTorch Chainer MxNet Deep Learning cuXfilter <> pyViz Visualization
  29. 37 ETL – Arrays and DataFrames Dask and CUDA Python

    arrays • Scales NumPy to distributed clusters • Used in climate science, imaging, HPC analysis up to 100TB size • Now seamlessly accelerated with GPUs
  30. 40 Architecture Time Single CPU Core 2hr 39min Forty CPU

    Cores 11min 30s One GPU 1min 37s Eight GPUs 19s Also…Achievement Unlocked: Petabyte Scale Data Analytics with Dask and CuPy Cluster configuration: 20x GCP instances, each instance has: CPU: 1 VM socket (Intel Xeon CPU @ 2.30GHz), 2-core, 2 threads/core, 132GB mem, GbE ethernet, 950 GB disk GPU: 4x NVIDIA Tesla P100-16GB-PCIe (total GPU DRAM across nodes 1.22 TB) Software: Ubuntu 18.04, RAPIDS 0.5.1, Dask=1.1.1, Dask-Distributed=1.1.1, CuPY=5.2.0, CUDA 10.0.130 https://blog.dask.org/2019/01/03/dask-array-gpus-first-steps
  31. 41 ETL – Arrays and DataFrames More Dask Awesomeness from

    RAPIDS https://youtu.be/gV0cykgsTPM https://youtu.be/R5CiXti_MWo
  32. 43 GPU Memory Data Preparation Visualization Model Training Dask Machine

    Learning More models more problems cuDF cuIO Analytics cuML Machine Learning cuGraph Graph Analytics PyTorch Chainer MxNet Deep Learning cuXfilter <> pyViz Visualization
  33. 44 Problem Data sizes continue to grow Histograms / Distributions

    Dimension Reduction Feature Selection Remove Outliers Sampling Massive Dataset Better to start with as much data as possible and explore / preprocess to scale to performance needs. Iterate. Cross Validate & Grid Search. Iterate some more. Meet reasonable speed vs accuracy tradeoff Hours? Days? Time Increases
  34. 45 ML Technology Stack Python Cython cuML Algorithms cuML Prims

    CUDA Libraries CUDA Dask cuML Dask cuDF cuDF Numpy Thrust Cub cuSolver nvGraph CUTLASS cuSparse cuRand cuBlas
  35. 46 Algorithms GPU-accelerated Scikit-Learn Classification / Regression Inference Clustering Decomposition

    & Dimensionality Reduction Time Series Decision Trees / Random Forests Linear Regression Logistic Regression K-Nearest Neighbors Support Vector Machine Classification Random forest / GBDT inference K-Means DBSCAN Spectral Clustering Principal Components Singular Value Decomposition UMAP Spectral Embedding T-SNE Holt-Winters Kalman Filtering Cross Validation More to come! Hyper-parameter Tuning Key: • Preexisting • NEW for 0.10
  36. 47 RAPIDS matches common Python APIs from sklearn.cluster import DBSCAN

    dbscan = DBSCAN(eps = 0.3, min_samples = 5) dbscan.fit(X) y_hat = dbscan.predict(X) Find Clusters from sklearn.datasets import make_moons import pandas X, y = make_moons(n_samples=int(1e2), noise=0.05, random_state=0) X = pandas.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])}) CPU-Based Clustering
  37. 48 RAPIDS matches common Python APIs from cuml import DBSCAN

    dbscan = DBSCAN(eps = 0.3, min_samples = 5) dbscan.fit(X) y_hat = dbscan.predict(X) Find Clusters from sklearn.datasets import make_moons import cudf X, y = make_moons(n_samples=int(1e2), noise=0.05, random_state=0) X = cudf.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])}) GPU-Accelerated Clustering
  38. 50 cuML’s Forest Inference Library accelerates prediction (inference) for random

    forests and boosted decision trees: • Works with existing saved models (XGBoost and LightGBM today, scikit-learn RF and cuML RF soon) • Lightweight Python API • Single V100 GPU can infer up to 34x faster than XGBoost dual-CPU node • Over 100 million forest inferences per sec (with 1000 trees) on a DGX-1 Forest Inference Taking models from training to production 23x 36x 34x 23x
  39. 51 Road to 1.0 October 2019 - RAPIDS 0.10 cuML

    Single-GPU Multi-GPU Multi-Node-Multi-GPU Gradient Boosted Decision Trees (GBDT) GLM Logistic Regression Random Forest K-Means K-NN DBSCAN UMAP Holt-Winters Kalman Filter t-SNE Principal Components Singular Value Decomposition SVM
  40. 52 Road to 1.0 March 2020 - RAPIDS 0.13 cuML

    Single-GPU Multi-Node-Multi-GPU Gradient Boosted Decision Trees (GBDT) GLM Logistic Regression Random Forest K-Means K-NN DBSCAN UMAP ARIMA & Holt-Winters Kalman Filter t-SNE Principal Components Singular Value Decomposition SVM
  41. 54 GPU Memory Data Preparation Visualization Model Training Dask Graph

    Analytics More connections more insights cuDF cuIO Analytics cuML Machine Learning cuGraph Graph Analytics PyTorch Chainer MxNet Deep Learning cuXfilter <> pyViz Visualization
  42. 55 GOALS AND BENEFITS OF CUGRAPH Focus on Features and

    User Experience • Property Graph support via DataFrames Seamless Integration with cuDF and cuML • Up to 500 million edges on a single 32GB GPU • Multi-GPU support for scaling into the billions of edges Breakthrough Performance • Python: Familiar NetworkX-like API • C/C++: lower-level granular control for application developers Multiple APIs • Extensive collection of algorithm, primitive, and utility functions Growing Functionality
  43. 56 Graph Technology Stack Python Cython cuGraph Algorithms Prims CUDA

    Libraries CUDA Dask cuGraph Dask cuDF cuDF Numpy thrust cub cuSolver cuSparse cuRand Gunrock* cuGraphBLAS cuHornet nvGRAPH has been Opened Sourced and integrated into cuGraph. A legacy version is available in a RAPIDS GitHub repo * Gunrock is from UC Davis
  44. 57 Algorithms GPU-accelerated NetworkX Community Components Link Analysis Link Prediction

    Traversal Structure Spectral Clustering Balanced-Cut Modularity Maximization Louvain Subgraph Extraction KCore Jaccard Weighted Jaccard Overlap Coefficient Single Source Shortest Path (SSSP) Breadth First Search (BFS) Triangle Counting COO-to-CSR (Multi-GPU) Transpose Multi-GPU More to come! Utilities Weakly Connected Components Strongly Connected Components Page Rank (Multi-GPU) Personal Page Rank Katz Query Language Renumbering
  45. 58 Louvain Single Run Dataset Nodes Edges preferentialAttachment 100,000 999,970

    caidaRouterLevel 192,244 1,218,132 coAuthorsDBLP 299,067 299,067 dblp-2010 326,186 1,615,400 citationCiteseer 268,495 2,313,294 coPapersDBLP 540,486 30,491,458 coPapersCiteseer 434,102 32,073,440 as-Skitter 1,696,415 22,190,596 Louvain returns: cudf.DataFrame with two names columns: louvain["vertex"]: The vertex id. louvain["partition"]: The assigned partition. G = cugraph.Graph() G.add_edge_list(gdf["src_0"], gdf["dst_0"], gdf["data"]) df, mod = cugraph.nvLouvain(G)
  46. 59 Multi-GPU PageRank Performance PageRank portion of the HiBench benchmark

    suite HiBench Scale Vertices Edges CSV File (GB) # of GPUs # of CPU Threads PageRank for 3 Iterations (secs) Huge 5,000,000 198,000,000 3 1 1.1 BigData 50,000,000 1,980,000,000 34 3 5.1 BigData x2 100,000,000 4,000,000,000 69 6 9.0 BigData x4 200,000,000 8,000,000,000 146 12 18.2 BigData x8 400,000,000 16,000,000,000 300 16 31.8 BigData x8 400,000,000 16,000,000,000 300 800* 5760* *BigData x8, 100x 8-vCPU nodes, Apache Spark GraphX ⇒ 96 mins!
  47. 60 Road to 1.0 October 2019 - RAPIDS 0.10 cuGraph

    Single-GPU Multi-GPU Multi-Node-Multi-GPU Jaccard and Weighted Jaccard Page Rank Personal Page Rank SSSP BFS Triangle Counting Subgraph Extraction Katz Centrality Betweenness Centrality Connected Components (Weak and Strong) Louvain Spectral Clustering K-Cores
  48. 61 Road to 1.0 March 2020 - RAPIDS 0.13 cuGraph

    Single-GPU Multi-Node-Multi-GPU Jaccard and Weighted Jaccard Page Rank Personal Page Rank SSSP BFS Triangle Counting Subgraph Extraction Katz Centrality Betweenness Centrality Connected Components (Weak and Strong) Louvain Spectral Clustering K-Cores
  49. 64 cuSpatial 0.10 • cuDF for data loading, cuGraph for

    routing optimization, and cuML for clustering are just a few examples Seamless Integration into RAPIDS • Extensive collection of algorithm, primitive, and utility functions for spatial analytics Growing Functionality • Up to 1000x faster than CPU spatial libraries • Python and C++ APIs for maximum usability and integration Breakthrough Performance & Ease of Use
  50. 65 cuSpatial 0.10 and Beyond Layer 0.10/0.11 Functionality Functionality Roadmap

    (2020) High-level Analytics C++ Library w. Python bindings enabling distance, speed, trajectory similarity, trajectory clustering C++ Library w. Python bindings for additional spatio-temporal trajectory clustering, acceleration, dwell-time, salient locations, trajectory anomaly detection, origin destination, etc. Graph layer cuGraph Map matching, Djikstra algorithm, Routing Query layer Spatial Window Nearest Neighbor,KNN, Spatiotemporal range search and joins Index layer Grid, Quad Tree, R-Tree, Geohash, Voronoi Tessellation Geo-operations Point in polygon (PIP), Haversine distance, Hausdorff distance, lat-lon to xy transformation Line intersecting polygon, Other distance functions, Polygon intersection, union Geo-representation Shape primitives, points, polylines, polygons Additional shape primitives
  51. 66 cuSpatial 0.10 cuSpatial Operation Input data cuSpatial Runtime Reference

    Runtime Speedup Point-in-Polygon Test 1.3+ million vehicle point locations and 27 Region of Interests 1.11 ms (C++) 1.50 ms (Python) [Nvidia Titan V] 334 ms (C++, optimized serial) 130468.2 ms (python Shapely API, serial) [Intel i7-7800X] 301X (C++) 86,978X (Python) Haversine Distance Computation 13+ million Monthly NYC taxi trip pickup and drop-off locations 7.61 ms (Python) [Nvidia T4] 416.9 ms (Numba) [Nvidia T4] 54.7X (Python) Hausdorff Distance Computation (for clustering) 10,700 trajectories with 1.3+ million points 13.5s [Quadro V100] 19227.5s (Python SciPy API, serial) [Intel i7-6700K] 1,400X (Python) Performance at a Glance
  52. 69 Building on top of RAPIDS A bigger, better, stronger

    ecosystem for all Streamz High-Performance Serverless event and data processing that utilizes RAPIDS for GPU Acceleration Distributed stream processing using RAPIDS and Dask GPU accelerated SQL engine built on top of RAPIDS
  53. 70 BlazingSQL AI SQL Queries Data Preparation Machine Learning Graph

    Analytics Apache Arrow on GPU blazingSQL cuDF cuML cuGRAPH https://blog.blazingdb.com/querying-600m-rows-on-blazingsql-43dfa7bfbf3c
  54. 71 CSV GDF ORC Parquet JSON ETL Feature Engineering cuML

    > cuDF BlazingSQL > > YOUR DATA MACHINE LEARNING from blazingsql import BlazingContext import cudf bc = BlazingContext() bc.s('bsql', bucket_name='bsql', access_key_id='<access_key>', secret_key='<secret_key') bc.create_table('orders', s3://bsql/orders/') gdf = bc.sql('select * from orders').get() BlazingSQL
  55. 73 Deploy RAPIDS Everywhere Focused on robust functionality, deployment, and

    user experience Integration with major cloud providers Both containers and cloud specific machine instances Support for Enterprise and HPC Orchestration Layers Cloud Dataproc Azure Machine Learning
  56. 78 Explore: RAPIDS Code and Blogs Check out our code

    and how we use it https://github.com/rapidsai https://medium.com/rapids-ai
  57. 79 Explore: Notebooks Contrib Notebooks Contrib Repo has tutorials and

    examples, and various E2E demos. RAPIDS Youtube channel has explanations, code walkthroughs and use cases.
  58. 82 Join the Movement Everyone can help! Integrations, feedback, documentation

    support, pull requests, new issues, or code donations welcomed! APACHE ARROW GPU Open Analytics Initiative https://arrow.apache.org/ @ApacheArrow http://gpuopenanalytics.com/ @GPUOAI RAPIDS https://rapids.ai @RAPIDSAI Dask https://dask.org @Dask_dev