Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ray Serve: Patterns of ML Models in Production (Simon Mo)

Ray Serve: Patterns of ML Models in Production (Simon Mo)

You trained a ML model, now what? The model needs to be deployed for online serving and offline processing. This talk walks through the journey of deploying your ML models in production. I will cover common deployment patterns backed by concrete use cases which are drawn from 100+ user interviews for Ray and Ray Serve. Lastly, I will cover how we built Ray Serve, a scalable model serving framework, from these learnings.

Anyscale

July 13, 2021
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. Building Ray Serve @ Anyscale Previously: Prediction Serving System @

    Berkeley RISELab Constantly Talking to ML Practitioners Who am I?
  2. Scalable and Programmable Serving Framework on Ray Framework Agnostic, Python

    First, and Easy to Use Helps you Scale in Production Ray Serve
  3. Ray Serve Common patterns of ML in production Ray Serve:

    your go-to framework for deploying ML models This talk is about
  4. Native Libraries 3rd Party Libraries most comprehensive set of distributed

    libraries universal framework for distributed computing Ray: Ecosystem
  5. Primitives for Distributed Apps Framework for ML Serving By leveraging

    Ray, Ray Serve is built for scale. [How Ray’s benefit -> Serve’s benefit] (Ion’s Slide? scalability, dev experience,)
  6. Ready for Production Ease of development Building ML Service Web

    Frameworks Can’t achieve • high performance • low cost Custom Tooling Hard to • develop • deploy • manage Specialized Systems Lost • flexibility • ease of use
  7. What makes Serve Different? Many Tools Run 1 Model Well

    With 1+ copies of the model -> Impossible? -> Complex YAML -> Scalability issue -> $$$
  8. Reality: - New models are developed over time - Scale

    out a single model - Compose multiple models together for real world use cases
  9. • Break Tasks into Steps • Scikit-Learn Pipeline: Pipeline([('scaler', StandardScaler()),

    ('svc', SVC())]) • Recommendation Systems: [EmbeddingLookup(), FeatureInteraction(), NearestNeighbors(), Ranking()] • Common Preprocessing: [HeavyWeightMLMegaModel(), DecisionTree()/BoostingModel()] Pipeline
  10. Pipeline Implementation Wrap models in web server Many specialized microservices

    Simple but not performant Complex and hard to operate
  11. Ray Serve Enables Seamless Model Composition Pythonic API High Performance

    Calls (No HTTP) 1 line to scale to 100 machines
  12. Business Logic • Database Lookup • Web API Calls •

    Feature Store Lookup • Feature Transformations
  13. Business Logic in Ray Serve Network Bound I/O Heavy Offloaded

    to Another Deployment Just Python Just Python
  14. Ray Serve Enables Arbitrary Business Logic Separate I/O and Compute

    Heavy Work Native FastAPI Ingress Scale-out Web Serving to Replicas
  15. Online Learning • Dynamically learn the model weights • Personalized

    models • Dynamically learn parameters to orchestrate the models • Model selections, Contextual Bandit • Reinforcement learning (RL) • State of the art “learn by interacting with the environment” • AlphaGo
  16. Wrap the models in the same handler Many microservices to

    manage Ray Serve offer the best of both worlds (Bring back the quad chart)
  17. Ray Serve: A Framework for 1+ Models in Production Deployment

    Ingress Handle Pythonic Interface Scalable Deployment Rolling Upgrade Fully Featured HTTP FastAPI Integration Arbitrary Composition Offload Computation Just Python
  18. Ready for Production Ease of development Building ML Service Web

    Frameworks Can’t achieve • high performance • low cost Custom Tooling Hard to • develop • deploy • manage Specialized Systems Lost • flexibility • ease of use Pythonic API Native FastAPI High Performance Scalability
  19. Ray Serve: Production Use Cases Leveraging the Possibilities of Ray

    Serve in Implementing a Scalable, Fully Automated Digital Authentication Service (Widas) [Thurs 12:25-12:55pm] How Ray and Anyscale Make it Easy to do Massive-scale ML on Aerial Imagery (Dendra) [Wed 1:45-2:15pm] Achieving Scalability and Interactivity with Ray Serve (Ikigai Labs) [Wed 01:45-2:15pm] Building High Availability and Scalability Online Computing Applications on Ray (Ant Group) [Wed 01:45-2:15pm] Ray and Anyscale: An Optimization Journey (OXW.io) [Wed 12:25-12:55pm]
  20. Takeaway: - ML in Productions = Many Models - 4

    Patterns: pipeline, ensemble, biz logic, online learning - Ray Serve is purpose built for scalable deployment More about Ray: - ray.io, rayserve.org - @raydistributed Career: Anyscale is hiring (anyscale.com) Thank you