Developing and deploying scalable multi-model inference pipelines

Slide 1

Slide 1 text

Serve Deployment Graph Jiao Dong @ Anyscale

Slide 2

Slide 2 text

● Motivation for multi-model inference graphs Outline ● Ray and Ray Serve Background ● Deployment Graph API walkthrough ● Real live demo: content understanding!

Slide 3

Slide 3 text

Motivation Machine learning inference graphs are getting longer, wider, and more dynamic. Blog: Ray Serve - Patterns of ML Models in Production

Slide 4

Slide 4 text

● Scalable Unique strengths of Ray ● Low-latency ● Part of ML ecosystem ● Efficient

Slide 5

Slide 5 text

@serve.deployment Serve deployment - single model MyModel HTTP endpoint Python handle http://localhost:8000/api?data=A python_handle.remote(“A”) Python deployment handle facilitates multi-model inference graph composition

Slide 6

Slide 6 text

Multi-model inference today Image Pre-process Model_1 Model_2 Model_3 combine Post-process User needs to explicitly call and get handle Dependency is hidden Hard to write efficient graph

Slide 7

Slide 7 text

Challenges with manual composition ● Deployment graph topology is hidden ● Hard to operate for production ● Hard to write efficient graph Solution: Graph building API!

Slide 8

Slide 8 text

Solution: Serve Deployment Graph API ● Fully Python programmable graph without writing YAML ● Can be developed, instantiated, and tested locally ○ YAML can be auto-generated for production usage ● Each model can be scaled and configured individually ● Uses a unified graph API across the Ray ecosystem

Slide 9

Slide 9 text

Deployment Graph API in Five Steps Your Input Preprocessor #1 Combine Model #1 Model #2 Dynamic Aggregate Preprocessor #2

Slide 10

Slide 10 text

Step 1/5: User InputNode and preprocessor Your Input preprocessor avg_preprocessor InputNode() – Your input to the graph .bind() – Graph building API on decorated body

Slide 11

Slide 11 text

Step 2/5: Model and combiner class Your Input Preprocessor #1 Combine Model #1 Model #2 Preprocessor #2

Slide 12

Slide 12 text

Step 3/5: Dynamic aggregation Your Input Preprocessor #1 Combine Model #1 Model #2 Preprocessor #2 Dynamic Aggregate

Slide 13

Slide 13 text

DAG Step 4/5: Driver for HTTP ingress Your Input Preprocessor #1 Combine Model #1 Model #2 Preprocessor #2 Dynamic Aggregate Driver HTTP endpoint Python handle Input Schema adapter

Slide 14

Slide 14 text

Step 5/5: Running the deployment graph Operator ● Consistent updates ● Many Replicas ● YAML Developer ● Quick updates ● Few Replicas ● Python

Slide 15

Slide 15 text

● Improved operational story (see Shreyas’ talk!) Future Improvements ● Automatic performance optimizations ● UX and visualization support

Slide 16

Slide 16 text

Bonus: Unified Ray DAG API ● DAG will be a first class API in Ray 2.0 across the libraries Common DAG API (@ray.remote tasks and actors) Ray Core Ray Serve Ray Workflows Eager execution Durable execution as workflow Online serving pipelines Ray Datasets Batch inference pipelines

Slide 17

Slide 17 text

Problem: Multi-model inference increasingly important ● Hard to author and iterate locally ● Performance is critical Conclusion Solution: Serve Deployment Graph API ● Enables Python local development and testing ● Efficient and scalable in production

Slide 18

Slide 18 text

● Join the community ○ discuss.ray.io ○ github.com/ray-project/ray ○ @raydistributed and @anyscalecompute ● Fill out our survey (QR code) for: ○ Feedback to help shape the future of Ray Serve ○ One-on-one sessions with developers ○ Updates about upcoming features Please get in touch 18

Slide 19

Slide 19 text

Demo - High level imge_url user_id: 5769 Classification Model_version: 1 —-------------------- ('hummingbird', 0.9991544485092163), ('bucket', 0.0001098369830287993) Image Caption “a bird sitting on a table with a frisbee” Image Segmentation

Slide 20

Slide 20 text

Demo - Details Your Input Downloader Preprocessor Image Segmentation Dynamic Dispatch Image Classifier #1 Image Classifier #2 Image Classifier #3 Image Captioning Render output Image features url = “https://bird/image.jepg” user_id = 5769 Hummingbird: 0.9991544 bucket: 0.000109 … Object mask Description user_id = 5769 Object mask

Slide 21

Slide 21 text

Demo - End to end flow Local graph building Run and iterate Add DAG Driver [CLI] serve run [CLI] serve build [CLI] serve deploy HTTP endpoint Configure HTTP HTTP endpoint Test Test Reconfigure