Slide 1

Slide 1 text

Offline Batch Inference: Comparing Ray, Apache Spark, and SageMaker Amog Kamsetty Email: [email protected] Twitter: @AmogKamsetty

Slide 2

Slide 2 text

Talk Overview - What is Batch inference and why does it matter?

Slide 3

Slide 3 text

Talk Overview - What is Batch inference and why does it matter? - Challenges with Batch inference

Slide 4

Slide 4 text

Talk Overview - What is Batch inference and why does it matter? - Challenges with Batch inference - Exploring current solution space

Slide 5

Slide 5 text

Talk Overview - What is Batch inference and why does it matter? - Challenges with Batch inference - Exploring current solution space - Comparing three solutions

Slide 6

Slide 6 text

Talk Overview - What is Batch inference and why does it matter? - Challenges with Batch inference - Exploring current solution space - Comparing three solutions - Sagemaker Batch Transform - Apache Spark - Ray Datasets

Slide 7

Slide 7 text

What is Offline Batch Inference?

Slide 8

Slide 8 text

Challenges 1. Managing compute infrastructure

Slide 9

Slide 9 text

Challenges 1. Managing heterogeneous compute infrastructure

Slide 10

Slide 10 text

Challenges 1. Managing heterogeneous compute infrastructure 2. Utilizing all resources in the cluster

Slide 11

Slide 11 text

Challenges 1. Managing heterogeneous compute infrastructure 2. Utilizing all resources in the cluster 3. Efficient data transfer storage -> CPU RAM -> GPU RAM

Slide 12

Slide 12 text

Challenges 1. Managing heterogeneous compute infrastructure 2. Utilizing all resources in the cluster 3. Efficient data transfer storage -> CPU RAM -> GPU RAM 4. Developer experience

Slide 13

Slide 13 text

What does the industry recommend?

Slide 14

Slide 14 text

What does the industry recommend?

Slide 15

Slide 15 text

What does the industry recommend?

Slide 16

Slide 16 text

What does the industry recommend?

Slide 17

Slide 17 text

What does the industry recommend?

Slide 18

Slide 18 text

What does the industry recommend?

Slide 19

Slide 19 text

What does the industry recommend?

Slide 20

Slide 20 text

What does the industry recommend?

Slide 21

Slide 21 text

What does the industry recommend?

Slide 22

Slide 22 text

Approach 1: Batch Services AWS Batch, GCP Batch, Azure Batch Partially handle Challenge #1 (managed infra) No heterogeneous clusters Don’t handle Challenges #2, #3, #4

Slide 23

Slide 23 text

Approach 1: Batch Services AWS Batch, GCP Batch, Azure Batch Partially handle Challenge #1 (managed infra) No heterogeneous clusters Don’t handle Challenges #2, #3, #4 What about Modal Labs?

Slide 24

Slide 24 text

Approach 2: Online Inference Solutions Bento ML, Ray Serve, Sagemaker Batch Transform ● Abstracts away infra complexities ● Abstractions for model packaging ● Framework integrations

Slide 25

Slide 25 text

Approach 2: Online Inference Solutions Bento ML, Ray Serve, Sagemaker Batch Transform ● Abstracts away infra complexities ● Abstractions for model packaging ● Framework integrations Unnecessary complexities for offline inference Starting HTTP Server, sending requests over network… Hard to saturate GPUs BentoML integrates with Spark for offline inference

Slide 26

Slide 26 text

Approach 3: Distributed Data Systems Spark, Ray Data Designed to handle Map operations on large datasets Native support for ● Scaling across clusters ● Data partitioning and batching ● I/O layer, connect to data sources

Slide 27

Slide 27 text

Image Classification: - SageMaker Batch Transform - Apache Spark - Ray Data

Slide 28

Slide 28 text

Benchmark Pretrained ResNet 50 model on ImageNet data 1. Reading images from S3 2. Simple CPU preprocessing (resizing, cropping, normalization) 3. Model inference on GPU 10 GB, 300 GB

Slide 29

Slide 29 text

10 GB

Slide 30

Slide 30 text

SageMaker Batch Transform Addresses Challenge #1 partially- abstracts away infra management But, Sagemaker Batch Transform uses architecture for online serving ● Starts HTTP Server, deploys model as endpoint ● Each image sent as a request to the server ● Cannot batch across multiple files ● Max payload size is 100 MB -> cannot saturate GPUs! Poor developer UX, difficult debugging

Slide 31

Slide 31 text

Comparing Ray Data and Spark Challenge #2: Utilizing all resources in the cluster

Slide 32

Slide 32 text

Comparing Ray Data and Spark Challenge #2: Utilizing all resources in the cluster for CPU+GPU workloads

Slide 33

Slide 33 text

Comparing Ray Data and Spark Challenge #2: Utilizing all resources in the cluster for CPU+GPU workloads

Slide 34

Slide 34 text

Comparing Ray Data and Spark Challenge #2: Utilizing all resources in the cluster for CPU+GPU workloads Challenge #3: Efficient data transfer for multi-dimensional tensors - Numpy+Pyarrow, no Pandas overhead - No JVM<>Pyarrow overhead

Slide 35

Slide 35 text

Comparing Ray Data and Spark Challenge #2: Utilizing all resources in the cluster for CPU+GPU workloads Challenge #3: Efficient data transfer for multi-dimensional tensors - Numpy+Pyarrow, no Pandas overhead - No JVM<>Pyarrow overhead Challenge #4: Developer Experience - Ray is Python first - Easier debugging, better stack traces

Slide 36

Slide 36 text

300 GB Let’s see this live!

Slide 37

Slide 37 text

Does Ray Data Scale to 10 TB? 40 GPU cluster Throughput: 11,580.958 img/sec 90%+ GPU utilization

Slide 38

Slide 38 text

Summary Ray Data outperforms Sagemaker and Spark for offline batch inference

Slide 39

Slide 39 text

Summary Ray Data outperforms Sagemaker and Spark for offline batch inference Ray Data meets all 4 challenges 1. Abstracts away compute infrastructure management, supports heterogeneous clusters.

Slide 40

Slide 40 text

Summary Ray Data outperforms Sagemaker and Spark for offline batch inference Ray Data meets all 4 challenges 1. Abstracts away compute infrastructure management, supports heterogeneous clusters. 2. Streams data from cloud storage -> CPU -> GPU, utilizing all cluster resources.

Slide 41

Slide 41 text

Summary Ray Data outperforms SageMaker and Spark for offline batch inference Ray Data meets all 4 challenges 1. Abstracts away compute infrastructure management, supports heterogeneous clusters. 2. Streams data from cloud storage -> CPU -> GPU, utilizing all cluster resources. 3. Support for multi-dimensional tensors with zero-copy exchange

Slide 42

Slide 42 text

Summary Ray Data outperforms Sagemaker and Spark for offline batch inference Ray Data meets all 4 challenges 1. Abstracts away compute infrastructure management, supports heterogeneous clusters. 2. Streams data from cloud storage -> CPU -> GPU, utilizing all cluster resources. 3. Support for multi-dimensional tensors with zero-copy exchange 4. Python native, making it easy to develop and debug.

Slide 43

Slide 43 text

What’s Next? batchinference.io Get in touch with us