Slide 1

Slide 1 text

Ray AIR: A Scalable Toolkit for End-to-end ML Applications Richard Liaw, Anyscale Xiaowei Jiang, Anyscale SF Bay ACM Meetup 10/24/2022

Slide 2

Slide 2 text

2 Who we are::Original creators of Ray What we do: Unified compute platform to develop, deploy, and manage scalable AI & Python applications with Ray Why do it: Scaling is a necessity, scaling is hard; make distributed computing easy and simple for everyone

Slide 3

Slide 3 text

What is Ray → A simple/general-purpose library for distributed computing → A unified Python toolkit Ray AI Runtime (for scaling ML and more) → Runs on laptop, public cloud, K8s, on-premise 3 A layered cake of functionality and capabilities for scaling ML workloads

Slide 4

Slide 4 text

A Layered Cake and Ecosystem 4 Run anywhere general-purpose framework for distributed computing Library + app ecosystem Ray core

Slide 5

Slide 5 text

5 An anatomy of a Ray cluster Driver Worker Global Control Store (GCS) Scheduler Object Store Raylet Worker Worker Scheduler Object Store Raylet Worker Worker Scheduler Object Store Raylet … … Head Node Worker Node #1 Worker Node #N . . . Unique to Ray

Slide 6

Slide 6 text

6 Python → Ray APIs def f(x): # do something with x: y = … return y @ray.remote def f(x): # do something with x: Y = … return y f.remote() for i in range(10000) f() Node … Task Distributed f() Node class Cls(): def __init__(self, x): def f(self, a): … def g(self, a): … Actor @ray.remote(num_cpus=2, num_gpus=4) class Cls(): def __init__(self, x): def f(self, a): … def g(self, a): … cls = Cls.remote() cls.f.remote(a) del cls Cls Node … Distributed Cls Node import numpy as np a= np.arange(1, 10e6) b = a * 2 Distributed immutable object import numpy as np a = np.arange(1, 10e6) obj_a = ray.put(a) b = ray.get(obj_a) * 2 Node … Distributed Node a a

Slide 7

Slide 7 text

Ray AI Runtime (AIR) is a scalable runtime for end-to-end ML applications 7

Slide 8

Slide 8 text

Project Overview 8 Ray team has worked with ML users and infra groups at e.g., Uber, Ant, Shopify, Cruise, OpenAI, etc. for several years. AIR is an effort to synthesize lessons learned into a simple toolkit for the community. - Built on Ray's existing scalable libraries - Unified APIs for e2e ML - Simplify ML Infra

Slide 9

Slide 9 text

Challenges we hear from users about ML Infrastructure

Slide 10

Slide 10 text

10 Still not easy to go from dev to prod at scale. preprocess.py train.py eval.py run_workflow.py

Slide 11

Slide 11 text

11 What happens when your ML infra gets out of date? preprocess.py train.py eval.py run_workflow.py

Slide 12

Slide 12 text

12 Scaling is hard, especially for data scientists. Key Problems of existing ML infrastructure Platforms solutions can limit flexibility. But custom distributed apps are too hard.

Slide 13

Slide 13 text

Analogy from a simpler time Filesystem "single sklearn script"

Slide 14

Slide 14 text

What AIR Provides Storage and Tracking "single AIR app" Preprocessing Training Scoring Serving ... ... ...

Slide 15

Slide 15 text

HERE IS A SECTION HEADER Introducing Ray AI Runtime (AIR)

Slide 16

Slide 16 text

Ray AI Runtime (AIR) is a scalable toolkit for end-to-end ML applications 16

Slide 17

Slide 17 text

Ray AI Runtime (AIR) is a scalable toolkit for end-to-end ML applications 17 Built on Ray Core for open and flexible ML compute end-to-end.

Slide 18

Slide 18 text

Ray AI Runtime (AIR) is a scalable toolkit for end-to-end ML applications 18 Built on Ray Core for open and flexible ML compute end-to-end. Since Ray focuses on compute, AIR leverages integrations for storage and tracking.

Slide 19

Slide 19 text

Ray AI Runtime (AIR) is a scalable runtime for end-to-end ML applications 19 High-level libraries that make scaling easy for both data scientists and ML engineers.

Slide 20

Slide 20 text

● Non-distributed systems / libraries ● Opinionated distributed systems / libraries Data science team High Friction Eng team Easy to scale with Ray AIR and libraries of their choice Data science team Eng team More performant, robust and scalable. Seamless handoff Development environment Production environment Development environment Production environment With non-scalable libraries With scalable libraries Importance of scalable library layer

Slide 21

Slide 21 text

Ray AI Runtime (AIR) is a scalable toolkit for end-to-end ML applications 21 Scalable integrations with best-of-breed libraries/MLOps tools

Slide 22

Slide 22 text

22 ● Built-in integrations Built-in integrations Integrations API Custom scalable components AIR Integrations ● Integrations API to easily add integrations ● Custom scalable components can be built on Ray Core

Slide 23

Slide 23 text

23 AIR simplifies scalable ML infrastructure Integrations with best-of-breed libraries/ MLOps tools A unified end-to-end ML runtime Make scaling easy for both data scientists and ML engineers

Slide 24

Slide 24 text

When would you use Ray AIR? 24 Scale a single type of workload Scale end-to-end ML applications Run ecosystem libraries using a unified API Build a custom ML platform

Slide 25

Slide 25 text

AIR is for the entire ML org 25 Scale a single type of workload Scale end-to-end ML applications Run ecosystem libraries using a unified API Build a custom ML platform A scalable, unified toolkit for both data scientists and software engineers.

Slide 26

Slide 26 text

Ray AIR vs Ray Core 26 Ray AIR Ray Core Who should use... Data Scientists & ML Engineers Advanced Infra & ML Groups If you want... Easy to get started and Ecosystem integrations Customizability and Control

Slide 27

Slide 27 text

What comes out of the box with AIR? 27 Training Tuning Batch Prediction Data Preprocessing Serving

Slide 28

Slide 28 text

Scalable Data Prep and Loading with Ray Data • Dataset library built for ML tasks • Seamlessly load distributed data from MB to TB scale • Preprocessors for unified training<>inference Trainer Worker Worker Worker Worker Dataset Trainer.fit dataset = ray.data.read_csv(“...”) preprocessor = ray.data.preprocessors.MinMaxScaler( ["value"]) trainer = ray.train.TorchTrainer( ..., preprocessor=preprocessor, dataset=dataset)

Slide 29

Slide 29 text

• Single API to run the most popular ML training frameworks • Seamless integration with other AIR libraries Scalable Model Training with Ray Train Trainer Checkpoint Datasets Tuner trainer = ray.train.TorchTrainer( train_loop, scaling_config=ScalingConfig( num_workers=100, use_gpu=True) preprocessor=preprocessor, dataset=dataset) result = trainer.fit()

Slide 30

Slide 30 text

Scalable Hyperparameter Tuning with Ray Tune • Run multiple concurrent Training jobs • Cutting edge optimization algorithms • Fault tolerance at scale trainer = TorchTrainer(...) tuner = Tuner( trainer, param_space={ “batch_size”: tune.grid_search( [1, 2, 3])}) results = tuner.fit() Trainer Tuner Trial Trainer Worker Worker Worker Tuner.fit

Slide 31

Slide 31 text

Scalable Batch Prediction with AIR's BatchPredictor • Execute inference on distributed data using CPUs and GPUs • Bring your own model or load existing checkpoints from Train predictor = BatchPredictor.from_checkpoint( checkpoint, XGBoostPredictor)(...) results = predictor.predict(dataset) results.write_parquet("s3://...") Model Batch Predictor Worker Worker GPU Worker predict Ray Dataset Shard Shard Shard

Slide 32

Slide 32 text

Scalable Online Inference with Ray Serve • Deploy single models as HA inference services in Ray • Build multi-model pipelines with custom business logic deployment = PredictorDeployment.options( name="XGBoostService") deployment.deploy(XGBoostPredictor, checkpoint, ...) print(deployment.url) Model Predictor Deployment Prediction requests

Slide 33

Slide 33 text

How to use AIR?

Slide 34

Slide 34 text

When would you use Ray AIR? 34 Scale a single type of workload Scale end-to-end ML applications Run ecosystem libraries using a unified API Build a custom ML platform

Slide 35

Slide 35 text

35 Using Ray AIR for a single workload Preprocess Training Batch Prediction on Ray Data loading Orchestrator Kubernetes/Cloud Prediction Results

Slide 36

Slide 36 text

Batch Prediction Prediction results Here’s an example of using Ray for one part of your ML pipeline. Scalable Batch Prediction on Ray AIR 36 Ray AIR data_url = "s3://YOUR_IMAGE_DATA” model = models.resnet18(pretrained=True)

Slide 37

Slide 37 text

Batch Prediction Prediction results Here’s an example of using Ray for one part of your ML pipeline. Scalable Batch Prediction on Ray AIR 37 Ray AIR data_url = "s3://YOUR_IMAGE_DATA” model = models.resnet18(pretrained=True) dataset = ray.data.read_datasource( ImageFolderDatasource(), paths=[data_url])

Slide 38

Slide 38 text

Batch Prediction Prediction results Here’s an example of using Ray for one part of your ML pipeline. Scalable Batch Prediction on Ray AIR 38 Ray AIR data_url = "s3://YOUR_IMAGE_DATA” model = models.resnet18(pretrained=True) dataset = ray.data.read_datasource( ImageFolderDatasource(), paths=[data_url]) ckpt = TorchCheckpoint.from_model(model) predictor = BatchPredictor.from_checkpoint( ckpt, TorchPredictor) outputs = predictor.predict( dataset, column=["image"])

Slide 39

Slide 39 text

Batch Prediction Prediction results Here’s an example of using Ray for one part of your ML pipeline. Scalable Batch Prediction on Ray AIR 39 Ray AIR data_url = "s3://YOUR_IMAGE_DATA” model = models.resnet18(pretrained=True) dataset = ray.data.read_datasource( ImageFolderDatasource(), paths=[data_url]) ckpt = TorchCheckpoint.from_model(model) predictor = BatchPredictor.from_checkpoint( ckpt, TorchPredictor) outputs = predictor.predict( dataset, column=["image"]) outputs.write_s3(...)

Slide 40

Slide 40 text

Compared to SageMaker Batch Inference… → Create Airflow DAG → Create Docker Image → Test Locally → Push docker image to ECR → Decide how many machines → Partition work across all machines → Copy files from S3 to local → Read all results from machines → Collate results → Tear all down Ray AIR vs Sagemaker Batch Inference Ray AIR in 3 steps → Start a Ray cluster → Submit your Python script → [Maybe] Shut down your Ray cluster

Slide 41

Slide 41 text

When would you use Ray AIR? 41 Scale a single type of workload Scale end-to-end ML applications Run ecosystem libraries using a unified API Build a custom ML platform

Slide 42

Slide 42 text

42 Using Ray AIR for E2E ML Workflows Hyperparameter Tuning on Ray Batch Prediction on Ray Distributed Training on Ray Data Processing on Ray

Slide 43

Slide 43 text

Scalable Data Processing (Ray Data) Using Ray AIR to scale E2E ML Workflows 43 dataset = ray.data.read_csv(...) train_ds, valid_ds = train_test_split( dataset, test_size=0.3) test_ds = valid_ds.drop_columns(["target"]) preprocessor = StandardScaler(columns=["mean radius"])

Slide 44

Slide 44 text

Scalable Data Processing (Ray Data) Using Ray AIR to scale E2E ML Workflows 44 dataset = ray.data.read_csv(...) train_ds, valid_ds = train_test_split( dataset, test_size=0.3) test_ds = valid_ds.drop_columns(["target"]) preprocessor = StandardScaler(columns=["mean radius"]) Scalable Model Training (Ray Train) trainer = ray.train.xgboost.XGBoostTrainer( scaling_config=ScalingConfig(num_workers=128), label_column="target", datasets=dict(train=train_ds, valid=valid_ds}, preprocessor=preprocessor) result = trainer.fit()

Slide 45

Slide 45 text

Scalable Data Processing (Ray Data) Using Ray AIR to scale E2E ML Workflows 45 dataset = ray.data.read_csv(...) train_ds, valid_ds = train_test_split( dataset, test_size=0.3) test_ds = valid_ds.drop_columns(["target"]) preprocessor = StandardScaler(columns=["mean radius"]) Scalable Model Training (Ray Train) trainer = ray.train.xgboost.XGBoostTrainer( scaling_config=ScalingConfig(num_workers=128), label_column="target", datasets=dict(train=train_ds, valid=valid_ds}, preprocessor=preprocessor) result = trainer.fit() Scalable Model Tuning (Ray Tune) tuner = ray.tune.Tuner( trainer, param_space={"params": {"max_depth": tune.randint(1, 9)}}, tune_config=TuneConfig( num_samples=5, metric="logloss", mode="min"), ) checkpoint = tuner.fit().get_best_result().checkpoint

Slide 46

Slide 46 text

Scalable Data Processing (Ray Data) Using Ray AIR to scale E2E ML Workflows 46 dataset = ray.data.read_csv(...) train_ds, valid_ds = train_test_split( dataset, test_size=0.3) test_ds = valid_ds.drop_columns(["target"]) preprocessor = StandardScaler(columns=["mean radius"]) Scalable Model Training (Ray Train) trainer = ray.train.xgboost.XGBoostTrainer( scaling_config=ScalingConfig(num_workers=128), label_column="target", datasets=dict(train=train_ds, valid=valid_ds}, preprocessor=preprocessor) result = trainer.fit() Scalable Model Tuning (Ray Tune) tuner = ray.tune.Tuner( trainer, param_space={"params": {"max_depth": tune.randint(1, 9)}}, tune_config=TuneConfig( num_samples=5, metric="logloss", mode="min"), ) checkpoint = tuner.fit().get_best_result().checkpoint Scalable Batch Prediction (Predictors) batch_predictor = BatchPredictor.from_checkpoint( checkpoint, XGBoostPredictor) predicted_probabilities = batch_predictor.predict(test_ds) predicted_probabilities.show()

Slide 47

Slide 47 text

Scalable Data Processing (Ray Data) Using Ray AIR to scale E2E ML Workflows 47 Scalable Model Training (Ray Train) Scalable Model Tuning (Ray Tune) Scalable Batch Prediction (Predictors) dataset = ray.data.read_csv(...) train_ds, valid_ds = train_test_split( dataset, test_size=0.3) test_ds = valid_dataset.drop_columns(["target"]) preprocessor = StandardScaler(columns=["mean radius"]) trainer = ray.train.xgboost.XGBoostTrainer( scaling_config=ScalingConfig(num_workers=128), label_column="target", datasets=dict(train=train_ds, valid=valid_ds}, preprocessor=preprocessor) result = trainer.fit() tuner = ray.tune.Tuner( trainer, param_space={"params": {"max_depth": tune.randint(1, 9)}}, tune_config=TuneConfig( num_samples=5, metric="logloss", mode="min"), ) checkpoint = tuner.fit().get_best_result().checkpoint batch_predictor = BatchPredictor.from_checkpoint( checkpoint, XGBoostPredictor) predicted_probabilities = batch_predictor.predict(test_ds) predicted_probabilities.show() Scale out to a cluster with 1 line change.

Slide 48

Slide 48 text

When would you use Ray AIR? 48 Scale a single type of workload Scale end-to-end ML applications Run ecosystem libraries using a unified API Build a custom ML platform

Slide 49

Slide 49 text

49 Integrations with: ● Data Ecosystem ● ML frameworks ● Optimization Libraries ● Model Monitoring ● Model Serving

Slide 50

Slide 50 text

Custom Integrations with Ray AIR 50 Scalable Data Preprocessing (Ray Data) Scalable Model Training (Ray Train) Scalable Model Tuning (Ray Tune) Scalable Batch Prediction (Predictors) from ray.train import DataParallelTrainer class JaxTrainer(DataParallelTrainer): # define custom training logic trainer = JaxTrainer(dataset={..}) trainer.fit() from ray.air.callbacks import Callback class CustomMLflowTracker(Callback): # define custom training logic tuner = Tuner(trainer, run_config=RunConfig( callback=CustomMLflowTracker()) tuner.fit() from ray.data.datasource import Datasource class DeltaLakeDatasource(Datasource): # define custom data source ds.read_datasource(DeltaLakeDatasource(...)) ds.write_datasource(DeltaLakeDatasource(...))

Slide 51

Slide 51 text

When would you use Ray AIR? 51 Scale a single type of workload Scale end-to-end ML applications Run ecosystem libraries using a unified API Build a custom ML platform

Slide 52

Slide 52 text

52 Case study: Merlin at Shopify

Slide 53

Slide 53 text

53 Using Ray AIR as the compute core for your ML platform Ray AIR program

Slide 54

Slide 54 text

Ray AIR program 54 Using Ray AIR as the compute core for your ML platform KubeRay / Anyscale Ray AIR program Ray AIR program

Slide 55

Slide 55 text

Ray AIR program 55 Using Ray AIR as the compute core for your ML platform KubeRay / Anyscale Ray AIR program Ray AIR program Model Registry Monitoring Experiment Tracking Feature Store Lakehouse Notebook Service Job Scheduler

Slide 56

Slide 56 text

Current status with AIR

Slide 57

Slide 57 text

Ray AIR Roadmap In Ray 2.0, Ray AIR is released as beta In future releases, we plan to add: ● Improved integrations with data sources, feature stores, and model monitoring services ● Improved scalability and performance benchmarks for data-intensive use cases 57

Slide 58

Slide 58 text

“ “ 58 Users are excited about Ray AIR! I’d say the productivity at least doubled if not more… I can’t wait until AIR is released. I’m using the nightly builds and it’s already a massive productivity boost. - Data Scientist from large music streaming company Ray AIR has greatly improved my developer experience … through intuitive abstractions like Ray Datasets and Train. In two weeks I was able to recreate and outperform a data ingest pipeline I built by hand over the course of 6 months. - ML Engineer at telematics startup

Slide 59

Slide 59 text

- Chat with the developers on the Ray Slack (#air-dogfooding channel!) - Come talk afterwards -- maybe we can form a recurring meetup in Seattle! How to get involved? 59

Slide 60

Slide 60 text

Thank you. Contact: rliaw @ anyscale.com (@richliaw)