Slide 1

Slide 1 text

Ray Serve: Patterns of ML Models in Production Simon Mo @ Anyscale Ray Summit 2021

Slide 2

Slide 2 text

Building Ray Serve @ Anyscale Previously: Prediction Serving System @ Berkeley RISELab Constantly Talking to ML Practitioners Who am I?

Slide 3

Slide 3 text

Scalable and Programmable Serving Framework on Ray Framework Agnostic, Python First, and Easy to Use Helps you Scale in Production Ray Serve

Slide 4

Slide 4 text

Ray Serve Common patterns of ML in production Ray Serve: your go-to framework for deploying ML models This talk is about

Slide 5

Slide 5 text

Ray Serve: Web Framework Simple to Deploy Web Services on Ray

Slide 6

Slide 6 text

Ray Serve: Model Serving Specialized for ML Model Serving GPUs Batching Scale-out Model Composition

Slide 7

Slide 7 text

Native Libraries 3rd Party Libraries most comprehensive set of distributed libraries universal framework for distributed computing Ray: Ecosystem

Slide 8

Slide 8 text

Primitives for Distributed Apps Framework for ML Serving By leveraging Ray, Ray Serve is built for scale. [How Ray’s benefit -> Serve’s benefit] (Ion’s Slide? scalability, dev experience,)

Slide 9

Slide 9 text

Ready for Production Ease of development Building ML Service Web Frameworks Can’t achieve ● high performance ● low cost Custom Tooling Hard to ● develop ● deploy ● manage Specialized Systems Lost ● flexibility ● ease of use

Slide 10

Slide 10 text

What makes Serve Different?

Slide 11

Slide 11 text

What makes Serve Different? Many Tools Run 1 Model Well With 1+ copies of the model -> Impossible? -> Complex YAML -> Scalability issue -> $$$

Slide 12

Slide 12 text

Reality: - New models are developed over time - Scale out a single model - Compose multiple models together for real world use cases

Slide 13

Slide 13 text

● Pipeline ● Ensemble ● Business Logic ● Online Learning Patterns of ML Models in Production

Slide 14

Slide 14 text

● Pipeline ● Ensemble ● Business Logic ● Online Learning Patterns

Slide 15

Slide 15 text

A Typical Computer Vision Pipeline CAT Standing Cat

Slide 16

Slide 16 text

● Break Tasks into Steps ● Scikit-Learn Pipeline: Pipeline([('scaler', StandardScaler()), ('svc', SVC())]) ● Recommendation Systems: [EmbeddingLookup(), FeatureInteraction(), NearestNeighbors(), Ranking()] ● Common Preprocessing: [HeavyWeightMLMegaModel(), DecisionTree()/BoostingModel()] Pipeline

Slide 17

Slide 17 text

Pipeline Implementation Wrap models in web server Many specialized microservices Simple but not performant Complex and hard to operate

Slide 18

Slide 18 text

Ray Serve: Handle Allow Deployments to Call Other Deployments

Slide 19

Slide 19 text

Ray Serve: Architecture

Slide 20

Slide 20 text

Ray Serve: Architecture

Slide 21

Slide 21 text

Ray Serve: Architecture

Slide 22

Slide 22 text

Ray Serve: Architecture

Slide 23

Slide 23 text

Ray Serve Enables Seamless Model Composition Pythonic API High Performance Calls (No HTTP) 1 line to scale to 100 machines

Slide 24

Slide 24 text

● Pipeline ● Ensemble ● Business Logic ● Online Learning Patterns

Slide 25

Slide 25 text

Mixing the output from 1+ models Ensemble

Slide 26

Slide 26 text

Model Update Ensemble Use Cases Aggregate Dynamic Selection

Slide 27

Slide 27 text

Wrap the models in the same handler Ensemble Deployment Many microservices to manage

Slide 28

Slide 28 text

Ensemble Example @ Ray Summit 2020

Slide 29

Slide 29 text

● Pipeline ● Ensemble ● Business Logic ● Online Learning Patterns

Slide 30

Slide 30 text

Business Logic ● Database Lookup ● Web API Calls ● Feature Store Lookup ● Feature Transformations

Slide 31

Slide 31 text

Business Logic in Action Network Bound I/O Heavy Compute Bound Memory Hungry

Slide 32

Slide 32 text

Key Question: Where to Run the Business Logic?

Slide 33

Slide 33 text

Business Logic in Ray Serve Network Bound I/O Heavy Offloaded to Another Deployment Just Python Just Python

Slide 34

Slide 34 text

Ray Serve: Ingress

Slide 35

Slide 35 text

Ray Serve Enables Arbitrary Business Logic Separate I/O and Compute Heavy Work Native FastAPI Ingress Scale-out Web Serving to Replicas

Slide 36

Slide 36 text

● Pipeline ● Ensemble ● Business Logic ● Online Learning Patterns

Slide 37

Slide 37 text

Online Learning • Dynamically learn the model weights • Personalized models • Dynamically learn parameters to orchestrate the models • Model selections, Contextual Bandit • Reinforcement learning (RL) • State of the art “learn by interacting with the environment” • AlphaGo

Slide 38

Slide 38 text

Online Learning Example https://www.anyscale.com/blog/online-resource-allocation-with-ray-at-ant-group

Slide 39

Slide 39 text

● Pipeline ● Ensemble ● Business Logic ● Online Learning Patterns Framework

Slide 40

Slide 40 text

Wrap the models in the same handler Many microservices to manage Ray Serve offer the best of both worlds (Bring back the quad chart)

Slide 41

Slide 41 text

Ray Serve: A Framework for 1+ Models in Production Deployment Ingress Handle Pythonic Interface Scalable Deployment Rolling Upgrade Fully Featured HTTP FastAPI Integration Arbitrary Composition Offload Computation Just Python

Slide 42

Slide 42 text

Ready for Production Ease of development Building ML Service Web Frameworks Can’t achieve ● high performance ● low cost Custom Tooling Hard to ● develop ● deploy ● manage Specialized Systems Lost ● flexibility ● ease of use Pythonic API Native FastAPI High Performance Scalability

Slide 43

Slide 43 text

Ray Serve: Production Use Cases Leveraging the Possibilities of Ray Serve in Implementing a Scalable, Fully Automated Digital Authentication Service (Widas) [Thurs 12:25-12:55pm] How Ray and Anyscale Make it Easy to do Massive-scale ML on Aerial Imagery (Dendra) [Wed 1:45-2:15pm] Achieving Scalability and Interactivity with Ray Serve (Ikigai Labs) [Wed 01:45-2:15pm] Building High Availability and Scalability Online Computing Applications on Ray (Ant Group) [Wed 01:45-2:15pm] Ray and Anyscale: An Optimization Journey (OXW.io) [Wed 12:25-12:55pm]

Slide 44

Slide 44 text

Takeaway: - ML in Productions = Many Models - 4 Patterns: pipeline, ensemble, biz logic, online learning - Ray Serve is purpose built for scalable deployment More about Ray: - ray.io, rayserve.org - @raydistributed Career: Anyscale is hiring (anyscale.com) Thank you