Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction_to_Ray_Serve.pdf

Anyscale
March 23, 2022

 Introduction_to_Ray_Serve.pdf

Ray Serve is Ray’s model serving library. Traditionally, model serving requires configuring a web server or a cloud-hosted solution. These approaches either lack scalability or hinder development through framework-specific tooling, vendor lock-in, and general inflexibility. Ray Serve overcomes these limitations. It offers a developer-friendly and framework-agnostic interface that provides scalable, production-ready model serving.

Ray Serve is
- Scalable: It provides fine-grained resource management and scaling using Ray.
- Framework-agnostic: It works with any Python code, regardless of framework.
- Production-ready: It comes with a web server out of the box and handles routing, testing, and scaling logic for deployments.
- Developer-friendly: It offers a decorator-based API that converts existing applications into Ray Serve deployments with minimal refactoring.

This presentation introduces Ray Serve, including its use cases and its features. It walks through Ray Serve setup and integration with existing machine learning models.

Anyscale

March 23, 2022
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. What is Ray? Framework that offers an API for distributed

    applications Provides fine-grained control over system behavior Supports many native libraries that scale ML applications
  2. Scalable and Programmable Serving Framework on Ray Framework Agnostic, Python

    First, and Easy to Use Helps you Scale in Production What is Ray Serve?
  3. Ray Serve for Model Serving Specialized for ML Model Serving

    GPUs Batching Scale-out Model Composition
  4. Native Libraries 3rd Party Libraries most comprehensive set of distributed

    libraries universal framework for distributed computing Ray: Ecosystem
  5. Ready for Production Ease of development Building ML Service Web

    Frameworks Can’t achieve • high performance • low cost Custom Tooling Hard to • develop • deploy • manage Specialized Systems Lost • flexibility • ease of use
  6. What makes Serve Different? Many Tools Run 1 Model Well

    With 1+ copies of the model -> Impossible? -> Complex YAML -> Scalability issue -> $$$
  7. Reality: - New models are developed over time - Scale

    out a single model - Compose multiple models together for real world use cases
  8. • Break Tasks into Steps • Scikit-Learn Pipeline: Pipeline([('scaler', StandardScaler()),

    ('svc', SVC())]) • Recommendation Systems: [EmbeddingLookup(), FeatureInteraction(), NearestNeighbors(), Ranking()] • Common Preprocessing: [HeavyWeightMLMegaModel(), DecisionTree()/BoostingModel()] Pipeline
  9. Pipeline Implementation Wrap models in web server Many specialized microservices

    Simple but not performant Complex and hard to operate
  10. Ray Serve Enables Seamless Model Composition Pythonic API High Performance

    Calls (No HTTP) 1 line to scale to 100 machines
  11. Business Logic • Database Lookup • Web API Calls •

    Feature Store Lookup • Feature Transformations
  12. Business Logic in Ray Serve Network Bound I/O Heavy Offloaded

    to Another Deployment Just Python Just Python
  13. Ray Serve Enables Arbitrary Business Logic Separate I/O and Compute

    Heavy Work Native FastAPI Ingress Scale-out Web Serving to Replicas
  14. Online Learning • Dynamically learn the model weights • Personalized

    models • Reinforcement learning (RL) • State of the art “learn by interacting with the environment”
  15. Ray Serve: A Framework for 1+ Models in Production Deployment

    Handle Pythonic Interface Scalable Deployment Rolling Upgrade Arbitrary Composition Offload Computation Just Python
  16. Ready for Production Ease of development Building ML Service Web

    Frameworks Can’t achieve • high performance • low cost Custom Tooling Hard to • develop • deploy • manage Specialized Systems Lost • flexibility • ease of use Pythonic API Native FastAPI High Performance Scalability