Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction_to_Ray_Serve.pdf

 Introduction_to_Ray_Serve.pdf

Ray Serve is Ray’s model serving library. Traditionally, model serving requires configuring a web server or a cloud-hosted solution. These approaches either lack scalability or hinder development through framework-specific tooling, vendor lock-in, and general inflexibility. Ray Serve overcomes these limitations. It offers a developer-friendly and framework-agnostic interface that provides scalable, production-ready model serving.

Ray Serve is
- Scalable: It provides fine-grained resource management and scaling using Ray.
- Framework-agnostic: It works with any Python code, regardless of framework.
- Production-ready: It comes with a web server out of the box and handles routing, testing, and scaling logic for deployments.
- Developer-friendly: It offers a decorator-based API that converts existing applications into Ray Serve deployments with minimal refactoring.

This presentation introduces Ray Serve, including its use cases and its features. It walks through Ray Serve setup and integration with existing machine learning models.

Af07bbf978a0989644b039ae6b8904a5?s=128

Anyscale
PRO

March 23, 2022
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. Introduction to Ray Serve Shreyas Krishnaswamy @ Anyscale Simon Mo

    @ Anyscale
  2. What is Ray? Framework that offers an API for distributed

    applications Provides fine-grained control over system behavior Supports many native libraries that scale ML applications
  3. Scalable and Programmable Serving Framework on Ray Framework Agnostic, Python

    First, and Easy to Use Helps you Scale in Production What is Ray Serve?
  4. Ray Serve background Walk-through demo Common ML patterns in production

    This talk
  5. Ray Serve as a Web Framework Simple to Deploy Web

    Services on Ray
  6. Ray Serve for Model Serving Specialized for ML Model Serving

    GPUs Batching Scale-out Model Composition
  7. Native Libraries 3rd Party Libraries most comprehensive set of distributed

    libraries universal framework for distributed computing Ray: Ecosystem
  8. Ready for Production Ease of development Building ML Service Web

    Frameworks Can’t achieve • high performance • low cost Custom Tooling Hard to • develop • deploy • manage Specialized Systems Lost • flexibility • ease of use
  9. Demo

  10. What makes Serve Different?

  11. What makes Serve Different? Many Tools Run 1 Model Well

    With 1+ copies of the model -> Impossible? -> Complex YAML -> Scalability issue -> $$$
  12. Reality: - New models are developed over time - Scale

    out a single model - Compose multiple models together for real world use cases
  13. • Pipeline • Ensemble • Business Logic • Online Learning

    Patterns of ML Models in Production
  14. • Pipeline • Ensemble • Business Logic • Online Learning

    Patterns
  15. A Typical Computer Vision Pipeline CAT Standing Cat

  16. • Break Tasks into Steps • Scikit-Learn Pipeline: Pipeline([('scaler', StandardScaler()),

    ('svc', SVC())]) • Recommendation Systems: [EmbeddingLookup(), FeatureInteraction(), NearestNeighbors(), Ranking()] • Common Preprocessing: [HeavyWeightMLMegaModel(), DecisionTree()/BoostingModel()] Pipeline
  17. Pipeline Implementation Wrap models in web server Many specialized microservices

    Simple but not performant Complex and hard to operate
  18. Pipeline in Ray Serve Deployments call other deployments with handles.

  19. Ray Serve: Architecture

  20. Ray Serve: Architecture

  21. Ray Serve: Architecture

  22. Ray Serve: Architecture

  23. Ray Serve Enables Seamless Model Composition Pythonic API High Performance

    Calls (No HTTP) 1 line to scale to 100 machines
  24. • Pipeline • Ensemble • Business Logic • Online Learning

    Patterns
  25. Mixing the output from 1+ models Ensemble

  26. Model Update Ensemble Use Cases Aggregate Dynamic Selection

  27. Wrap the models in the same handler Ensemble Deployment Many

    microservices to manage
  28. • Pipeline • Ensemble • Business Logic • Online Learning

    Patterns
  29. Business Logic • Database Lookup • Web API Calls •

    Feature Store Lookup • Feature Transformations
  30. Business Logic in Action Network Bound I/O Heavy Compute Bound

    Memory Hungry
  31. Business Logic in Ray Serve Network Bound I/O Heavy Offloaded

    to Another Deployment Just Python Just Python
  32. Ray Serve: Ingress

  33. Ray Serve Enables Arbitrary Business Logic Separate I/O and Compute

    Heavy Work Native FastAPI Ingress Scale-out Web Serving to Replicas
  34. • Pipeline • Ensemble • Business Logic • Online Learning

    Patterns
  35. Online Learning • Dynamically learn the model weights • Personalized

    models • Reinforcement learning (RL) • State of the art “learn by interacting with the environment”
  36. Online Learning Example https://www.anyscale.com/blog/online-resource-allocation-with-ray-at-ant-group

  37. • Pipeline • Ensemble • Business Logic • Online Learning

    Patterns Framework
  38. Ray Serve: A Framework for 1+ Models in Production Deployment

    Handle Pythonic Interface Scalable Deployment Rolling Upgrade Arbitrary Composition Offload Computation Just Python
  39. Ready for Production Ease of development Building ML Service Web

    Frameworks Can’t achieve • high performance • low cost Custom Tooling Hard to • develop • deploy • manage Specialized Systems Lost • flexibility • ease of use Pythonic API Native FastAPI High Performance Scalability
  40. More about Ray: - ray.io, rayserve.org - @raydistributed Thank you