Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dask Demo Day - How RAPIDS users use Dask

Dask Demo Day - How RAPIDS users use Dask

Jacob Tomlinson

October 19, 2023

More Decks by Jacob Tomlinson

Other Decks in Technology


  1. 4 Common RAPIDS use cases ▸ Workloads ◦ Dask-cuDF +

    XGBoost risk assessment on >1TB datasets ◦ LLM Text preprocessing on >10TB datasets ◦ Apache Beam pipelines ◦ Graph neural networks ▸ Sectors ◦ Retail ◦ Financial services ◦ Cyber security ◦ Telecoms ◦ Automotive ▸ Market size ◦ Some users spending 6-7 figures per month on GPU Dask clusters ◦ Clusters with up to 100 GPU workers Where do we see people using RAPIDS? ▸ Platforms ◦ Google Cloud ◦ AWS ◦ Azure ◦ Oracle ◦ On Prem (often SLURM) ▸ Dask often paired with ◦ XGBoost ◦ Optuna ◦ Spark ◦ Apache Beam ◦ Numba ◦ PyTorch ◦ Tensorflow
  2. 10 Open Source Community ▸ NVIDIA targets Large Enterprises with

    RAPIDS. ▸ Large Enterprise users are less likely to open GitHub issues than academic or SME users. ▸ RAPIDS users typically work for companies with direct links to NVIDIA for support. ▸ Dask issues are often discussed in high-level projects like dask-cudf, nvtabular, NeMo, etc Maybe we could do better at communicating back to the Dask community… Why don’t we see RAPIDS discussed more in OSS Dask?