Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dask Demo Day - How RAPIDS users use Dask

Dask Demo Day - How RAPIDS users use Dask

Jacob Tomlinson

October 19, 2023
Tweet

More Decks by Jacob Tomlinson

Other Decks in Technology

Transcript

  1. How do folks use Dask + RAPIDS?
    Dask Demo Day Oct 2023

    View full-size slide

  2. 2
    25% of the Fortune 100
    use RAPIDS

    View full-size slide

  3. 3
    Growing community

    View full-size slide

  4. 4
    Common RAPIDS use cases
    ▸ Workloads
    ○ Dask-cuDF + XGBoost risk assessment on >1TB datasets
    ○ LLM Text preprocessing on >10TB datasets
    ○ Apache Beam pipelines
    ○ Graph neural networks
    ▸ Sectors
    ○ Retail
    ○ Financial services
    ○ Cyber security
    ○ Telecoms
    ○ Automotive
    ▸ Market size
    ○ Some users spending 6-7 figures per month on GPU
    Dask clusters
    ○ Clusters with up to 100 GPU workers
    Where do we see people using RAPIDS?
    ▸ Platforms
    ○ Google Cloud
    ○ AWS
    ○ Azure
    ○ Oracle
    ○ On Prem (often SLURM)
    ▸ Dask often paired with
    ○ XGBoost
    ○ Optuna
    ○ Spark
    ○ Apache Beam
    ○ Numba
    ○ PyTorch
    ○ Tensorflow

    View full-size slide

  5. 5
    Model evaluation in recommender systems
    https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s32017/

    View full-size slide

  6. 6
    Feature processing in recommender systems
    https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s32017/

    View full-size slide

  7. 7
    Signal processing in autonomous vehicles
    https://www.nvidia.com/en-us/on-demand/session/gtcspring23-s51336/

    View full-size slide

  8. 8
    LLM Data Preprocessing (10s of TBs)
    https://developer.nvidia.com/blog/curating-trillion-token-datasets-introducing-nemo-data-curator/

    View full-size slide

  9. 9
    Apache Beam Pipelines
    https://www.youtube.com/watch?v=uGEQkws1Low

    View full-size slide

  10. 10
    Open Source Community
    ▸ NVIDIA targets Large Enterprises with RAPIDS.
    ▸ Large Enterprise users are less likely to open GitHub issues than
    academic or SME users.
    ▸ RAPIDS users typically work for companies with direct links to NVIDIA
    for support.
    ▸ Dask issues are often discussed in high-level projects like dask-cudf,
    nvtabular, NeMo, etc
    Maybe we could do better at communicating back to the Dask community…
    Why don’t we see RAPIDS discussed more in OSS Dask?

    View full-size slide