Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Getting science done with accelerated Python co...

Getting science done with accelerated Python computing platforms

Jacob Tomlinson

November 16, 2024
Tweet

More Decks by Jacob Tomlinson

Other Decks in Science

Transcript

  1. “ “ — Grace Hopper, 1982 The amount of data

    and the demand for instant access will continue to increase New Mission Critical Software 1970s-1980s
  2. The Next Decade of Data Expectations Internet scale data |

    Massive models | Real-time performance Data Volume in Zetabytes Source: IDC, Revelations in the Global DataSphere, US49346123, July 2023 Recommenders Fraud Detection LLMs Genomic Analysis Forecasting Cybersecurity
  3. 10 Accelerated Computing Swim Lanes RAPIDS makes accelerated computing more

    seamless while enabling specialization for maximum performance
  4. Bringing NVIDIA accelerated computing to Polars Polars GPU Engine Powered

    by RAPIDS cuDF https://developer.nvidia.com/blog/polars-gpu-engine-powered-by-rapids-cudf-now-available-in-open-beta/
  5. 14 Accelerated Apache Spark Zero code change acceleration for Spark

    DataFrames and SQL spark.sql(""" select order count(*) as order_count from orders""" ) spark.conf.set("spark.plugins", "com.nvidia.spark.SQLPlugin") spark.sql(""" select order count(*) as order_count from orders""" ) CPU Spark GPU Spark Average Speed-Ups: >5x • Operates as a software plugin to popular Apache Spark platform • Automatically accelerates supported operations (with CPU fallback if needed) • Requires no code changes • Works with Spark standalone, YARN clusters, Kubernetes clusters • Deploy on: Apache Spark 3.4.1, RAPIDS Spark release 24.04 See GTC session S62257 for details NVIDIA Decision Support Benchmark 3TB (Public Cloud) Amazon EMR Google Cloud Dataproc
  6. 15 Accelerated Dask Just set “cudf” as the backend and

    use Dask-CUDA Workers • Configurable Backend and GPU-Aware Workers • Memory Spilling (GPU->CPU->Disk) • Optimized Memory Management • Accelerated RDMA and Networking (UCX)
  7. 16 cuML Accelerated machine learning with a scikit-learn API >>>

    from sklearn.ensemble import RandomForestClassifier >>> clf = RandomForestClassifier() >>> clf.fit(x, y) >>> from cuml.ensemble import RandomForestClassifier >>> clf = RandomForestClassifier() >>> clf.fit(x, y) GPU CPU Scikit-learn cuML Time Series Preprocessing Classification Tree Models Cross Validation Clustering Explainability Dimensionality Reduction Regression 50+ GPU-Accelerated Algorithms A100 GPU vs. AMD EPYC 7642 (96 logical cores) cuML 23.04, scikit-learn 1.2.2, umap-learn 0.5.3
  8. 17 Accelerated XGBoost “XGBoost is All You Need” – Bojan

    Tunguz, 4x Kaggle Grandmaster >>> from xgboost import XGBClassifier >>> clf = XGBClassifier() >>> clf.fit(x, y) >>> from xgboost import XGBClassifier >>> clf = XGBClassifier(device=”cuda”) >>> clf.fit(x, y) GPU CPU XGBoost XGBoost Up to 20x Speedups • One line of code change to unlock up to 20x speedups with GPUs • Scalable to the world’s largest datasets with Dask and PySpark • Built-in SHAP support for model explainability • Deployable with Triton for lighting-fast inference in production • RAPIDS helps maintain the XGBoost project
  9. 18 Accelerated NetworkX nx-cugraph: the zero-code change GPU backend for

    NetworkX • Zero-code-change GPU-acceleration of for NetworkX code • Accelerates algorithms up to 600x, based on algorithm and graph size • Support for 60 popular graph algorithms and growing • Falls back to using CPU NetworkX for unsupported algorithms NetworkX 3.2, CPU: Intel(R) Xeon(R) Platinum 8480CL 2TB, GPU: NVIDIA H100 80GB pip install nx-cugraph-cu12 --extra-index-url https://pypi.nvidia.com conda install -c rapidsai -c conda-forge -c nvidia nx-cugraph
  10. 23 RAPIDS Deployment Models Scales from sharing GPUs to leveraging

    many GPUs at once Single Node Multi Node Shared Node Scale up interactive data science sessions with NVIDIA accelerated tools like cudf.pandas Scale out processing and training by leveraging GPU acceleration in distributed frameworks like Dask and Spark Scale out AI/ML APIs and model serving with NVIDIA Triton Inference Server and the Forest Inference Library
  11. 24 RAPIDS on Managed Notebook Platforms Serverless Jupyter in the

    cloud Example screenshot from Vertex AI documentation https://docs.rapids.ai/deployment/stable/cloud/gcp/vertex-ai/
  12. 25 RAPIDS on Compute pipelines Data processing services Example from

    AWS EMR documentation https://docs.nvidia.com/spark-rapids/user-guide/latest/getting-started/aws-emr.html
  13. 26 RAPIDS on Virtual Machines Servers and workstations in the

    cloud Example from Azure Virtual Machine documentation https://docs.rapids.ai/deployment/stable/cloud/azure/azure-vm/
  14. 27 GPU Operator Kubernetes GPU GPU GPU GPU GPU GPU

    GPU GPU RAPIDS on Kubernetes Unified Cloud Deployments
  15. 29 RAPIDS runs your workloads faster How do you want

    to spend those gains? Reduce cost Reduce the amount of time you need to run servers. Beneficial for reducing cloud costs. Do more work Run more workloads for the same time/cost. Process things that were not possible before. Performance boost Get work done faster. May help give a competitive advantage or reduce pressure on SLAs. Environment impact Reduce power needed to perform the same calculation. Using less power produces less CO2. Reduce context switching Reduce time people need to wait for calculations to complete which helps avoid switching to a different task. Improve accuracy Acceleration could allow for more iterations or to process more data leading to improved model accuracy
  16. 30 Use Case: Sharing resources with multi-tenancy Smoothing out demand

    peaks while reducing context switching Using Kubernetes we created an autoscaling cluster for interactive Jupyter sessions. Users only use GPUs when they are running computations. The cluster keeps some reserved GPU capacity so that user computations are fulfilled quickly. An overhead of 30% meant that 60% of user computations started within 2 seconds, and 90% within 60 seconds. This can be tuned to suit your needs, more overhead capacity results in reduced wait times. Whatever your preference your cost is always correlated to your compute demand. https://docs.rapids.ai/deployment/stable/examples/rapids-autoscaling-multi-tenant-kubernetes/notebook/