Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Developing and deploying scalable multi-model inference pipelines

Developing and deploying scalable multi-model inference pipelines

In this talk, we aim to show how to leverage the programmable and general-purpose distributed computing ability of Ray to facilitate authoring, orchestrating, scaling, and deployment of complex serving pipelines as a DAG under one set of APIs, like a microservice. Learn how you can program multiple models dynamically on your laptop as if you’re writing a local Python script, deploy to production at scale, and upgrade individually.



April 14, 2022

More Decks by Anyscale

Other Decks in Technology


  1. Serve Deployment Graph Jiao Dong @ Anyscale

  2. • Motivation for multi-model inference graphs Outline • Ray and

    Ray Serve Background • Deployment Graph API walkthrough • Real live demo: content understanding!
  3. Motivation Machine learning inference graphs are getting longer, wider, and

    more dynamic. Blog: Ray Serve - Patterns of ML Models in Production
  4. • Scalable Unique strengths of Ray • Low-latency • Part

    of ML ecosystem • Efficient
  5. @serve.deployment Serve deployment - single model MyModel HTTP endpoint Python

    handle http://localhost:8000/api?data=A python_handle.remote(“A”) Python deployment handle facilitates multi-model inference graph composition
  6. Multi-model inference today Image Pre-process Model_1 Model_2 Model_3 combine Post-process

    User needs to explicitly call and get handle Dependency is hidden Hard to write efficient graph
  7. Challenges with manual composition • Deployment graph topology is hidden

    • Hard to operate for production • Hard to write efficient graph Solution: Graph building API!
  8. Solution: Serve Deployment Graph API • Fully Python programmable graph

    without writing YAML • Can be developed, instantiated, and tested locally ◦ YAML can be auto-generated for production usage • Each model can be scaled and configured individually • Uses a unified graph API across the Ray ecosystem
  9. Deployment Graph API in Five Steps Your Input Preprocessor #1

    Combine Model #1 Model #2 Dynamic Aggregate Preprocessor #2
  10. Step 1/5: User InputNode and preprocessor Your Input preprocessor avg_preprocessor

    InputNode() – Your input to the graph .bind() – Graph building API on decorated body
  11. Step 2/5: Model and combiner class Your Input Preprocessor #1

    Combine Model #1 Model #2 Preprocessor #2
  12. Step 3/5: Dynamic aggregation Your Input Preprocessor #1 Combine Model

    #1 Model #2 Preprocessor #2 Dynamic Aggregate
  13. DAG Step 4/5: Driver for HTTP ingress Your Input Preprocessor

    #1 Combine Model #1 Model #2 Preprocessor #2 Dynamic Aggregate Driver HTTP endpoint Python handle Input Schema adapter
  14. Step 5/5: Running the deployment graph Operator • Consistent updates

    • Many Replicas • YAML Developer • Quick updates • Few Replicas • Python
  15. • Improved operational story (see Shreyas’ talk!) Future Improvements •

    Automatic performance optimizations • UX and visualization support
  16. Bonus: Unified Ray DAG API • DAG will be a

    first class API in Ray 2.0 across the libraries Common DAG API (@ray.remote tasks and actors) Ray Core Ray Serve Ray Workflows Eager execution Durable execution as workflow Online serving pipelines Ray Datasets Batch inference pipelines
  17. Problem: Multi-model inference increasingly important • Hard to author and

    iterate locally • Performance is critical Conclusion Solution: Serve Deployment Graph API • Enables Python local development and testing • Efficient and scalable in production
  18. • Join the community ◦ discuss.ray.io ◦ github.com/ray-project/ray ◦ @raydistributed

    and @anyscalecompute • Fill out our survey (QR code) for: ◦ Feedback to help shape the future of Ray Serve ◦ One-on-one sessions with developers ◦ Updates about upcoming features Please get in touch 18
  19. Demo - High level imge_url user_id: 5769 Classification Model_version: 1

    —-------------------- ('hummingbird', 0.9991544485092163), ('bucket', 0.0001098369830287993) Image Caption “a bird sitting on a table with a frisbee” Image Segmentation
  20. Demo - Details Your Input Downloader Preprocessor Image Segmentation Dynamic

    Dispatch Image Classifier #1 Image Classifier #2 Image Classifier #3 Image Captioning Render output Image features url = “https://bird/image.jepg” user_id = 5769 Hummingbird: 0.9991544 bucket: 0.000109 … Object mask Description user_id = 5769 Object mask
  21. Demo - End to end flow Local graph building Run

    and iterate Add DAG Driver [CLI] serve run [CLI] serve build [CLI] serve deploy HTTP endpoint Configure HTTP HTTP endpoint Test Test Reconfigure