Introduction_to_Ray_Serve.pdf

Introduction to Ray Serve Shreyas Krishnaswamy @ Anyscale Simon Mo
@ Anyscale

What is Ray? Framework that offers an API for distributed
applications Provides fine-grained control over system behavior Supports many native libraries that scale ML applications

Scalable and Programmable Serving Framework on Ray Framework Agnostic, Python
First, and Easy to Use Helps you Scale in Production What is Ray Serve?

Ray Serve background Walk-through demo Common ML patterns in production
This talk

Ray Serve as a Web Framework Simple to Deploy Web
Services on Ray

Ray Serve for Model Serving Specialized for ML Model Serving
GPUs Batching Scale-out Model Composition

Native Libraries 3rd Party Libraries most comprehensive set of distributed
libraries universal framework for distributed computing Ray: Ecosystem

Ready for Production Ease of development Building ML Service Web
Frameworks Can’t achieve • high performance • low cost Custom Tooling Hard to • develop • deploy • manage Specialized Systems Lost • flexibility • ease of use

What makes Serve Different?

What makes Serve Different? Many Tools Run 1 Model Well
With 1+ copies of the model -> Impossible? -> Complex YAML -> Scalability issue -> $$$

Reality: - New models are developed over time - Scale
out a single model - Compose multiple models together for real world use cases

• Pipeline • Ensemble • Business Logic • Online Learning
Patterns of ML Models in Production

Patterns

A Typical Computer Vision Pipeline CAT Standing Cat

• Break Tasks into Steps • Scikit-Learn Pipeline: Pipeline([('scaler', StandardScaler()),
('svc', SVC())]) • Recommendation Systems: [EmbeddingLookup(), FeatureInteraction(), NearestNeighbors(), Ranking()] • Common Preprocessing: [HeavyWeightMLMegaModel(), DecisionTree()/BoostingModel()] Pipeline

Pipeline Implementation Wrap models in web server Many specialized microservices
Simple but not performant Complex and hard to operate

Pipeline in Ray Serve Deployments call other deployments with handles.

Ray Serve: Architecture

Ray Serve Enables Seamless Model Composition Pythonic API High Performance
Calls (No HTTP) 1 line to scale to 100 machines

Patterns

Mixing the output from 1+ models Ensemble

Model Update Ensemble Use Cases Aggregate Dynamic Selection

Wrap the models in the same handler Ensemble Deployment Many
microservices to manage

Patterns

Business Logic • Database Lookup • Web API Calls •
Feature Store Lookup • Feature Transformations

Business Logic in Action Network Bound I/O Heavy Compute Bound
Memory Hungry

Business Logic in Ray Serve Network Bound I/O Heavy Offloaded
to Another Deployment Just Python Just Python

Ray Serve: Ingress

Ray Serve Enables Arbitrary Business Logic Separate I/O and Compute
Heavy Work Native FastAPI Ingress Scale-out Web Serving to Replicas

Patterns

Online Learning • Dynamically learn the model weights • Personalized
models • Reinforcement learning (RL) • State of the art “learn by interacting with the environment”

Online Learning Example https://www.anyscale.com/blog/online-resource-allocation-with-ray-at-ant-group

Patterns Framework

Ray Serve: A Framework for 1+ Models in Production Deployment
Handle Pythonic Interface Scalable Deployment Rolling Upgrade Arbitrary Composition Offload Computation Just Python

Ready for Production Ease of development Building ML Service Web
Frameworks Can’t achieve • high performance • low cost Custom Tooling Hard to • develop • deploy • manage Specialized Systems Lost • flexibility • ease of use Pythonic API Native FastAPI High Performance Scalability

More about Ray: - ray.io, rayserve.org - @raydistributed Thank you

Introduction_to_Ray_Serve.pdf

Introduction_to_Ray_Serve.pdf

More Decks by Anyscale

Other Decks in Technology

Featured

Transcript