Offline Batch Inference: Comparing Ray, Apache Spark, SageMaker

Offline Batch Inference: Comparing Ray, Apache Spark, and SageMaker Amog
Kamsetty Email: [email protected] Twitter: @AmogKamsetty

Talk Overview - What is Batch inference and why does
it matter?

it matter? - Challenges with Batch inference

it matter? - Challenges with Batch inference - Exploring current solution space

it matter? - Challenges with Batch inference - Exploring current solution space - Comparing three solutions

it matter? - Challenges with Batch inference - Exploring current solution space - Comparing three solutions - Sagemaker Batch Transform - Apache Spark - Ray Datasets

What is Offline Batch Inference?

Challenges 1. Managing compute infrastructure

Challenges 1. Managing heterogeneous compute infrastructure

Challenges 1. Managing heterogeneous compute infrastructure 2. Utilizing all resources
in the cluster

in the cluster 3. Eﬃcient data transfer storage -> CPU RAM -> GPU RAM

in the cluster 3. Eﬃcient data transfer storage -> CPU RAM -> GPU RAM 4. Developer experience

What does the industry recommend?

Approach 1: Batch Services AWS Batch, GCP Batch, Azure Batch
Partially handle Challenge #1 (managed infra) No heterogeneous clusters Don’t handle Challenges #2, #3, #4

Approach 1: Batch Services AWS Batch, GCP Batch, Azure Batch
Partially handle Challenge #1 (managed infra) No heterogeneous clusters Don’t handle Challenges #2, #3, #4 What about Modal Labs?

Approach 2: Online Inference Solutions Bento ML, Ray Serve, Sagemaker
Batch Transform • Abstracts away infra complexities • Abstractions for model packaging • Framework integrations

Approach 2: Online Inference Solutions Bento ML, Ray Serve, Sagemaker
Batch Transform • Abstracts away infra complexities • Abstractions for model packaging • Framework integrations Unnecessary complexities for oﬄine inference Starting HTTP Server, sending requests over network… Hard to saturate GPUs BentoML integrates with Spark for oﬄine inference

Approach 3: Distributed Data Systems Spark, Ray Data Designed to
handle Map operations on large datasets Native support for • Scaling across clusters • Data partitioning and batching • I/O layer, connect to data sources

Image Classification: - SageMaker Batch Transform - Apache Spark -
Ray Data

Benchmark Pretrained ResNet 50 model on ImageNet data 1. Reading
images from S3 2. Simple CPU preprocessing (resizing, cropping, normalization) 3. Model inference on GPU 10 GB, 300 GB

SageMaker Batch Transform Addresses Challenge #1 partially- abstracts away infra
management But, Sagemaker Batch Transform uses architecture for online serving • Starts HTTP Server, deploys model as endpoint • Each image sent as a request to the server • Cannot batch across multiple ﬁles • Max payload size is 100 MB -> cannot saturate GPUs! Poor developer UX, diﬃcult debugging

Comparing Ray Data and Spark Challenge #2: Utilizing all resources
in the cluster

in the cluster for CPU+GPU workloads

in the cluster for CPU+GPU workloads Challenge #3: Eﬃcient data transfer for multi-dimensional tensors - Numpy+Pyarrow, no Pandas overhead - No JVM<>Pyarrow overhead

in the cluster for CPU+GPU workloads Challenge #3: Eﬃcient data transfer for multi-dimensional tensors - Numpy+Pyarrow, no Pandas overhead - No JVM<>Pyarrow overhead Challenge #4: Developer Experience - Ray is Python ﬁrst - Easier debugging, better stack traces

300 GB Let’s see this live!

Does Ray Data Scale to 10 TB? 40 GPU cluster
Throughput: 11,580.958 img/sec 90%+ GPU utilization

Summary Ray Data outperforms Sagemaker and Spark for oﬄine batch
inference

inference Ray Data meets all 4 challenges 1. Abstracts away compute infrastructure management, supports heterogeneous clusters.

inference Ray Data meets all 4 challenges 1. Abstracts away compute infrastructure management, supports heterogeneous clusters. 2. Streams data from cloud storage -> CPU -> GPU, utilizing all cluster resources.

Summary Ray Data outperforms SageMaker and Spark for oﬄine batch
inference Ray Data meets all 4 challenges 1. Abstracts away compute infrastructure management, supports heterogeneous clusters. 2. Streams data from cloud storage -> CPU -> GPU, utilizing all cluster resources. 3. Support for multi-dimensional tensors with zero-copy exchange

inference Ray Data meets all 4 challenges 1. Abstracts away compute infrastructure management, supports heterogeneous clusters. 2. Streams data from cloud storage -> CPU -> GPU, utilizing all cluster resources. 3. Support for multi-dimensional tensors with zero-copy exchange 4. Python native, making it easy to develop and debug.

What’s Next? batchinference.io Get in touch with us

Offline Batch Inference: Comparing Ray, Apache ...

Offline Batch Inference: Comparing Ray, Apache Spark, SageMaker

More Decks by Anyscale

Other Decks in Technology

Featured

Transcript