$30 off During Our Annual Pro Sale. View Details »

Offline Batch Inference: Comparing Ray, Apache Spark, SageMaker

Offline Batch Inference: Comparing Ray, Apache Spark, SageMaker

As more companies use large scale machine learning (ML) models for training and evaluation, offline batch inference becomes an essential workload. A number of challenges come with it: managing compute infrastructure; optimizing use of all heterogeneous resources; and transferring data from storage to hardware accelerators. Addressing these challenges, Ray performs significantly better as it can coordinate clusters of diverse resources, allowing for better utilization of the specific resource requirements of the workload.

In this talk we will talk about:
* What are the challenges and limitations
* Examine three different solutions for offline batch inference: AWS SageMaker
* Batch Transform, Apache Spark, and Ray Data.
* Share our performance numbers showing Ray data as the best solution for offline batch inference at scale

Anyscale
PRO

June 22, 2023
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. Offline Batch Inference:
    Comparing Ray, Apache Spark,
    and SageMaker
    Amog Kamsetty
    Email: [email protected]
    Twitter: @AmogKamsetty

    View Slide

  2. Talk Overview
    - What is Batch inference and why does it matter?

    View Slide

  3. Talk Overview
    - What is Batch inference and why does it matter?
    - Challenges with Batch inference

    View Slide

  4. Talk Overview
    - What is Batch inference and why does it matter?
    - Challenges with Batch inference
    - Exploring current solution space

    View Slide

  5. Talk Overview
    - What is Batch inference and why does it matter?
    - Challenges with Batch inference
    - Exploring current solution space
    - Comparing three solutions

    View Slide

  6. Talk Overview
    - What is Batch inference and why does it matter?
    - Challenges with Batch inference
    - Exploring current solution space
    - Comparing three solutions
    - Sagemaker Batch Transform
    - Apache Spark
    - Ray Datasets

    View Slide

  7. What is Offline Batch Inference?

    View Slide

  8. Challenges
    1. Managing compute infrastructure

    View Slide

  9. Challenges
    1. Managing heterogeneous compute infrastructure

    View Slide

  10. Challenges
    1. Managing heterogeneous compute infrastructure
    2. Utilizing all resources in the cluster

    View Slide

  11. Challenges
    1. Managing heterogeneous compute infrastructure
    2. Utilizing all resources in the cluster
    3. Efficient data transfer storage -> CPU RAM -> GPU RAM

    View Slide

  12. Challenges
    1. Managing heterogeneous compute infrastructure
    2. Utilizing all resources in the cluster
    3. Efficient data transfer storage -> CPU RAM -> GPU RAM
    4. Developer experience

    View Slide

  13. What does the industry
    recommend?

    View Slide

  14. What does the industry recommend?

    View Slide

  15. What does the industry recommend?

    View Slide

  16. What does the industry recommend?

    View Slide

  17. What does the industry recommend?

    View Slide

  18. What does the industry recommend?

    View Slide

  19. What does the industry recommend?

    View Slide

  20. What does the industry recommend?

    View Slide

  21. What does the industry recommend?

    View Slide

  22. Approach 1: Batch Services
    AWS Batch, GCP Batch, Azure Batch
    Partially handle Challenge #1 (managed infra)
    No heterogeneous clusters
    Don’t handle Challenges #2, #3, #4

    View Slide

  23. Approach 1: Batch Services
    AWS Batch, GCP Batch, Azure Batch
    Partially handle Challenge #1 (managed infra)
    No heterogeneous clusters
    Don’t handle Challenges #2, #3, #4
    What about Modal Labs?

    View Slide

  24. Approach 2: Online Inference Solutions
    Bento ML, Ray Serve, Sagemaker Batch Transform
    ● Abstracts away infra complexities
    ● Abstractions for model packaging
    ● Framework integrations

    View Slide

  25. Approach 2: Online Inference Solutions
    Bento ML, Ray Serve, Sagemaker Batch Transform
    ● Abstracts away infra complexities
    ● Abstractions for model packaging
    ● Framework integrations
    Unnecessary complexities for offline inference
    Starting HTTP Server, sending requests over network…
    Hard to saturate GPUs
    BentoML integrates with Spark for offline inference

    View Slide

  26. Approach 3: Distributed Data Systems
    Spark, Ray Data
    Designed to handle Map operations on large datasets
    Native support for
    ● Scaling across clusters
    ● Data partitioning and batching
    ● I/O layer, connect to data sources

    View Slide

  27. Image Classification:
    - SageMaker Batch Transform
    - Apache Spark
    - Ray Data

    View Slide

  28. Benchmark
    Pretrained ResNet 50 model on ImageNet data
    1. Reading images from S3
    2. Simple CPU preprocessing (resizing, cropping, normalization)
    3. Model inference on GPU
    10 GB, 300 GB

    View Slide

  29. 10 GB

    View Slide

  30. SageMaker Batch Transform
    Addresses Challenge #1 partially- abstracts away infra management
    But, Sagemaker Batch Transform uses architecture for online serving
    ● Starts HTTP Server, deploys model as endpoint
    ● Each image sent as a request to the server
    ● Cannot batch across multiple files
    ● Max payload size is 100 MB -> cannot saturate GPUs!
    Poor developer UX, difficult debugging

    View Slide

  31. Comparing Ray Data and Spark
    Challenge #2: Utilizing all resources in the cluster

    View Slide

  32. Comparing Ray Data and Spark
    Challenge #2: Utilizing all resources in the cluster for CPU+GPU workloads

    View Slide

  33. Comparing Ray Data and Spark
    Challenge #2: Utilizing all resources in the cluster for CPU+GPU workloads

    View Slide

  34. Comparing Ray Data and Spark
    Challenge #2: Utilizing all resources in the cluster for CPU+GPU workloads
    Challenge #3: Efficient data transfer for multi-dimensional tensors
    - Numpy+Pyarrow, no Pandas overhead
    - No JVM<>Pyarrow overhead

    View Slide

  35. Comparing Ray Data and Spark
    Challenge #2: Utilizing all resources in the cluster for CPU+GPU workloads
    Challenge #3: Efficient data transfer for multi-dimensional tensors
    - Numpy+Pyarrow, no Pandas overhead
    - No JVM<>Pyarrow overhead
    Challenge #4: Developer Experience
    - Ray is Python first
    - Easier debugging, better stack traces

    View Slide

  36. 300 GB
    Let’s see this live!

    View Slide

  37. Does Ray Data Scale to 10 TB?
    40 GPU cluster
    Throughput: 11,580.958 img/sec
    90%+ GPU utilization

    View Slide

  38. Summary
    Ray Data outperforms Sagemaker and Spark for offline batch
    inference

    View Slide

  39. Summary
    Ray Data outperforms Sagemaker and Spark for offline batch
    inference
    Ray Data meets all 4 challenges
    1. Abstracts away compute infrastructure management, supports
    heterogeneous clusters.

    View Slide

  40. Summary
    Ray Data outperforms Sagemaker and Spark for offline batch
    inference
    Ray Data meets all 4 challenges
    1. Abstracts away compute infrastructure management, supports
    heterogeneous clusters.
    2. Streams data from cloud storage -> CPU -> GPU, utilizing all
    cluster resources.

    View Slide

  41. Summary
    Ray Data outperforms SageMaker and Spark for offline batch
    inference
    Ray Data meets all 4 challenges
    1. Abstracts away compute infrastructure management, supports
    heterogeneous clusters.
    2. Streams data from cloud storage -> CPU -> GPU, utilizing all
    cluster resources.
    3. Support for multi-dimensional tensors with zero-copy exchange

    View Slide

  42. Summary
    Ray Data outperforms Sagemaker and Spark for offline batch
    inference
    Ray Data meets all 4 challenges
    1. Abstracts away compute infrastructure management, supports
    heterogeneous clusters.
    2. Streams data from cloud storage -> CPU -> GPU, utilizing all
    cluster resources.
    3. Support for multi-dimensional tensors with zero-copy exchange
    4. Python native, making it easy to develop and debug.

    View Slide

  43. What’s Next?
    batchinference.io
    Get in touch with us

    View Slide