Ray AIR: A. Scalable Toolkit for End-to-end ML Applications

Ray AIR: A Scalable Toolkit for End-to-end ML Applications Richard
Liaw, Anyscale Xiaowei Jiang, Anyscale SF Bay ACM Meetup 10/24/2022

2 Who we are::Original creators of Ray What we do:
Unified compute platform to develop, deploy, and manage scalable AI & Python applications with Ray Why do it: Scaling is a necessity, scaling is hard; make distributed computing easy and simple for everyone

What is Ray → A simple/general-purpose library for distributed computing
→ A unified Python toolkit Ray AI Runtime (for scaling ML and more) → Runs on laptop, public cloud, K8s, on-premise 3 A layered cake of functionality and capabilities for scaling ML workloads

A Layered Cake and Ecosystem 4 Run anywhere general-purpose framework
for distributed computing Library + app ecosystem Ray core

5 An anatomy of a Ray cluster Driver Worker Global
Control Store (GCS) Scheduler Object Store Raylet Worker Worker Scheduler Object Store Raylet Worker Worker Scheduler Object Store Raylet … … Head Node Worker Node #1 Worker Node #N . . . Unique to Ray

6 Python → Ray APIs def f(x): # do something
with x: y = … return y @ray.remote def f(x): # do something with x: Y = … return y f.remote() for i in range(10000) f() Node … Task Distributed f() Node class Cls(): def __init__(self, x): def f(self, a): … def g(self, a): … Actor @ray.remote(num_cpus=2, num_gpus=4) class Cls(): def __init__(self, x): def f(self, a): … def g(self, a): … cls = Cls.remote() cls.f.remote(a) del cls Cls Node … Distributed Cls Node import numpy as np a= np.arange(1, 10e6) b = a * 2 Distributed immutable object import numpy as np a = np.arange(1, 10e6) obj_a = ray.put(a) b = ray.get(obj_a) * 2 Node … Distributed Node a a

Ray AI Runtime (AIR) is a scalable runtime for end-to-end
ML applications 7

Project Overview 8 Ray team has worked with ML users
and infra groups at e.g., Uber, Ant, Shopify, Cruise, OpenAI, etc. for several years. AIR is an effort to synthesize lessons learned into a simple toolkit for the community. - Built on Ray's existing scalable libraries - Unified APIs for e2e ML - Simplify ML Infra

Challenges we hear from users about ML Infrastructure

10 Still not easy to go from dev to prod
at scale. preprocess.py train.py eval.py run_workflow.py

11 What happens when your ML infra gets out of
date? preprocess.py train.py eval.py run_workflow.py

12 Scaling is hard, especially for data scientists. Key Problems
of existing ML infrastructure Platforms solutions can limit flexibility. But custom distributed apps are too hard.

Analogy from a simpler time Filesystem "single sklearn script"

What AIR Provides Storage and Tracking "single AIR app" Preprocessing
Training Scoring Serving ... ... ...

HERE IS A SECTION HEADER Introducing Ray AI Runtime (AIR)

Ray AI Runtime (AIR) is a scalable toolkit for end-to-end
ML applications 16

ML applications 17 Built on Ray Core for open and flexible ML compute end-to-end.

ML applications 18 Built on Ray Core for open and flexible ML compute end-to-end. Since Ray focuses on compute, AIR leverages integrations for storage and tracking.

Ray AI Runtime (AIR) is a scalable runtime for end-to-end
ML applications 19 High-level libraries that make scaling easy for both data scientists and ML engineers.

• Non-distributed systems / libraries • Opinionated distributed systems /
libraries Data science team High Friction Eng team Easy to scale with Ray AIR and libraries of their choice Data science team Eng team More performant, robust and scalable. Seamless handoff Development environment Production environment Development environment Production environment With non-scalable libraries With scalable libraries Importance of scalable library layer

ML applications 21 Scalable integrations with best-of-breed libraries/MLOps tools

22 • Built-in integrations Built-in integrations Integrations API Custom scalable
components AIR Integrations • Integrations API to easily add integrations • Custom scalable components can be built on Ray Core

23 AIR simplifies scalable ML infrastructure Integrations with best-of-breed libraries/
MLOps tools A unified end-to-end ML runtime Make scaling easy for both data scientists and ML engineers

When would you use Ray AIR? 24 Scale a single
type of workload Scale end-to-end ML applications Run ecosystem libraries using a unified API Build a custom ML platform

AIR is for the entire ML org 25 Scale a
single type of workload Scale end-to-end ML applications Run ecosystem libraries using a unified API Build a custom ML platform A scalable, unified toolkit for both data scientists and software engineers.

Ray AIR vs Ray Core 26 Ray AIR Ray Core
Who should use... Data Scientists & ML Engineers Advanced Infra & ML Groups If you want... Easy to get started and Ecosystem integrations Customizability and Control

What comes out of the box with AIR? 27 Training
Tuning Batch Prediction Data Preprocessing Serving

Scalable Data Prep and Loading with Ray Data • Dataset
library built for ML tasks • Seamlessly load distributed data from MB to TB scale • Preprocessors for unified training<>inference Trainer Worker Worker Worker Worker Dataset Trainer.fit dataset = ray.data.read_csv(“...”) preprocessor = ray.data.preprocessors.MinMaxScaler( ["value"]) trainer = ray.train.TorchTrainer( ..., preprocessor=preprocessor, dataset=dataset)

• Single API to run the most popular ML training
frameworks • Seamless integration with other AIR libraries Scalable Model Training with Ray Train Trainer Checkpoint Datasets Tuner trainer = ray.train.TorchTrainer( train_loop, scaling_config=ScalingConfig( num_workers=100, use_gpu=True) preprocessor=preprocessor, dataset=dataset) result = trainer.fit()

Scalable Hyperparameter Tuning with Ray Tune • Run multiple concurrent
Training jobs • Cutting edge optimization algorithms • Fault tolerance at scale trainer = TorchTrainer(...) tuner = Tuner( trainer, param_space={ “batch_size”: tune.grid_search( [1, 2, 3])}) results = tuner.fit() Trainer Tuner Trial Trainer Worker Worker Worker Tuner.fit

Scalable Batch Prediction with AIR's BatchPredictor • Execute inference on
distributed data using CPUs and GPUs • Bring your own model or load existing checkpoints from Train predictor = BatchPredictor.from_checkpoint( checkpoint, XGBoostPredictor)(...) results = predictor.predict(dataset) results.write_parquet("s3://...") Model Batch Predictor Worker Worker GPU Worker predict Ray Dataset Shard Shard Shard

Scalable Online Inference with Ray Serve • Deploy single models
as HA inference services in Ray • Build multi-model pipelines with custom business logic deployment = PredictorDeployment.options( name="XGBoostService") deployment.deploy(XGBoostPredictor, checkpoint, ...) print(deployment.url) Model Predictor Deployment Prediction requests

How to use AIR?

35 Using Ray AIR for a single workload Preprocess Training
Batch Prediction on Ray Data loading Orchestrator Kubernetes/Cloud Prediction Results

Batch Prediction Prediction results Here’s an example of using Ray
for one part of your ML pipeline. Scalable Batch Prediction on Ray AIR 36 Ray AIR data_url = "s3://YOUR_IMAGE_DATA” model = models.resnet18(pretrained=True)

for one part of your ML pipeline. Scalable Batch Prediction on Ray AIR 37 Ray AIR data_url = "s3://YOUR_IMAGE_DATA” model = models.resnet18(pretrained=True) dataset = ray.data.read_datasource( ImageFolderDatasource(), paths=[data_url])

for one part of your ML pipeline. Scalable Batch Prediction on Ray AIR 38 Ray AIR data_url = "s3://YOUR_IMAGE_DATA” model = models.resnet18(pretrained=True) dataset = ray.data.read_datasource( ImageFolderDatasource(), paths=[data_url]) ckpt = TorchCheckpoint.from_model(model) predictor = BatchPredictor.from_checkpoint( ckpt, TorchPredictor) outputs = predictor.predict( dataset, column=["image"])

for one part of your ML pipeline. Scalable Batch Prediction on Ray AIR 39 Ray AIR data_url = "s3://YOUR_IMAGE_DATA” model = models.resnet18(pretrained=True) dataset = ray.data.read_datasource( ImageFolderDatasource(), paths=[data_url]) ckpt = TorchCheckpoint.from_model(model) predictor = BatchPredictor.from_checkpoint( ckpt, TorchPredictor) outputs = predictor.predict( dataset, column=["image"]) outputs.write_s3(...)

Compared to SageMaker Batch Inference… → Create Airflow DAG →
Create Docker Image → Test Locally → Push docker image to ECR → Decide how many machines → Partition work across all machines → Copy files from S3 to local → Read all results from machines → Collate results → Tear all down Ray AIR vs Sagemaker Batch Inference Ray AIR in 3 steps → Start a Ray cluster → Submit your Python script → [Maybe] Shut down your Ray cluster

42 Using Ray AIR for E2E ML Workflows Hyperparameter Tuning
on Ray Batch Prediction on Ray Distributed Training on Ray Data Processing on Ray

Scalable Data Processing (Ray Data) Using Ray AIR to scale
E2E ML Workflows 43 dataset = ray.data.read_csv(...) train_ds, valid_ds = train_test_split( dataset, test_size=0.3) test_ds = valid_ds.drop_columns(["target"]) preprocessor = StandardScaler(columns=["mean radius"])

E2E ML Workflows 44 dataset = ray.data.read_csv(...) train_ds, valid_ds = train_test_split( dataset, test_size=0.3) test_ds = valid_ds.drop_columns(["target"]) preprocessor = StandardScaler(columns=["mean radius"]) Scalable Model Training (Ray Train) trainer = ray.train.xgboost.XGBoostTrainer( scaling_config=ScalingConfig(num_workers=128), label_column="target", datasets=dict(train=train_ds, valid=valid_ds}, preprocessor=preprocessor) result = trainer.fit()

E2E ML Workflows 45 dataset = ray.data.read_csv(...) train_ds, valid_ds = train_test_split( dataset, test_size=0.3) test_ds = valid_ds.drop_columns(["target"]) preprocessor = StandardScaler(columns=["mean radius"]) Scalable Model Training (Ray Train) trainer = ray.train.xgboost.XGBoostTrainer( scaling_config=ScalingConfig(num_workers=128), label_column="target", datasets=dict(train=train_ds, valid=valid_ds}, preprocessor=preprocessor) result = trainer.fit() Scalable Model Tuning (Ray Tune) tuner = ray.tune.Tuner( trainer, param_space={"params": {"max_depth": tune.randint(1, 9)}}, tune_config=TuneConfig( num_samples=5, metric="logloss", mode="min"), ) checkpoint = tuner.fit().get_best_result().checkpoint

E2E ML Workflows 46 dataset = ray.data.read_csv(...) train_ds, valid_ds = train_test_split( dataset, test_size=0.3) test_ds = valid_ds.drop_columns(["target"]) preprocessor = StandardScaler(columns=["mean radius"]) Scalable Model Training (Ray Train) trainer = ray.train.xgboost.XGBoostTrainer( scaling_config=ScalingConfig(num_workers=128), label_column="target", datasets=dict(train=train_ds, valid=valid_ds}, preprocessor=preprocessor) result = trainer.fit() Scalable Model Tuning (Ray Tune) tuner = ray.tune.Tuner( trainer, param_space={"params": {"max_depth": tune.randint(1, 9)}}, tune_config=TuneConfig( num_samples=5, metric="logloss", mode="min"), ) checkpoint = tuner.fit().get_best_result().checkpoint Scalable Batch Prediction (Predictors) batch_predictor = BatchPredictor.from_checkpoint( checkpoint, XGBoostPredictor) predicted_probabilities = batch_predictor.predict(test_ds) predicted_probabilities.show()

E2E ML Workflows 47 Scalable Model Training (Ray Train) Scalable Model Tuning (Ray Tune) Scalable Batch Prediction (Predictors) dataset = ray.data.read_csv(...) train_ds, valid_ds = train_test_split( dataset, test_size=0.3) test_ds = valid_dataset.drop_columns(["target"]) preprocessor = StandardScaler(columns=["mean radius"]) trainer = ray.train.xgboost.XGBoostTrainer( scaling_config=ScalingConfig(num_workers=128), label_column="target", datasets=dict(train=train_ds, valid=valid_ds}, preprocessor=preprocessor) result = trainer.fit() tuner = ray.tune.Tuner( trainer, param_space={"params": {"max_depth": tune.randint(1, 9)}}, tune_config=TuneConfig( num_samples=5, metric="logloss", mode="min"), ) checkpoint = tuner.fit().get_best_result().checkpoint batch_predictor = BatchPredictor.from_checkpoint( checkpoint, XGBoostPredictor) predicted_probabilities = batch_predictor.predict(test_ds) predicted_probabilities.show() Scale out to a cluster with 1 line change.

49 Integrations with: • Data Ecosystem • ML frameworks •
Optimization Libraries • Model Monitoring • Model Serving

Custom Integrations with Ray AIR 50 Scalable Data Preprocessing (Ray
Data) Scalable Model Training (Ray Train) Scalable Model Tuning (Ray Tune) Scalable Batch Prediction (Predictors) from ray.train import DataParallelTrainer class JaxTrainer(DataParallelTrainer): # define custom training logic trainer = JaxTrainer(dataset={..}) trainer.fit() from ray.air.callbacks import Callback class CustomMLflowTracker(Callback): # define custom training logic tuner = Tuner(trainer, run_config=RunConfig( callback=CustomMLflowTracker()) tuner.fit() from ray.data.datasource import Datasource class DeltaLakeDatasource(Datasource): # define custom data source ds.read_datasource(DeltaLakeDatasource(...)) ds.write_datasource(DeltaLakeDatasource(...))

52 Case study: Merlin at Shopify

53 Using Ray AIR as the compute core for your
ML platform Ray AIR program

Ray AIR program 54 Using Ray AIR as the compute
core for your ML platform KubeRay / Anyscale Ray AIR program Ray AIR program

Ray AIR program 55 Using Ray AIR as the compute
core for your ML platform KubeRay / Anyscale Ray AIR program Ray AIR program Model Registry Monitoring Experiment Tracking Feature Store Lakehouse Notebook Service Job Scheduler

Current status with AIR

Ray AIR Roadmap In Ray 2.0, Ray AIR is released
as beta In future releases, we plan to add: • Improved integrations with data sources, feature stores, and model monitoring services • Improved scalability and performance benchmarks for data-intensive use cases 57

“ “ 58 Users are excited about Ray AIR! I’d
say the productivity at least doubled if not more… I can’t wait until AIR is released. I’m using the nightly builds and it’s already a massive productivity boost. - Data Scientist from large music streaming company Ray AIR has greatly improved my developer experience … through intuitive abstractions like Ray Datasets and Train. In two weeks I was able to recreate and outperform a data ingest pipeline I built by hand over the course of 6 months. - ML Engineer at telematics startup

- Chat with the developers on the Ray Slack (#air-dogfooding
channel!) - Come talk afterwards -- maybe we can form a recurring meetup in Seattle! How to get involved? 59

Thank you. Contact: rliaw @ anyscale.com (@richliaw)

Ray AIR: A. Scalable Toolkit for End-to-end ML ...

Ray AIR: A. Scalable Toolkit for End-to-end ML Applications

More Decks by Anyscale

Other Decks in Technology

Featured

Transcript