Ray Serve: Overview and future roadmap

Slide 1

Slide 1 text

Overview and Roadmap Edward Oakes [email protected]

Slide 2

Slide 2 text

Outline ● Ray Serve Introduction ● Feedback from the Community ● Plans for Ray 2.0 and beyond ○ Preview for other talks today! 2

Slide 3

Slide 3 text

Native Libraries 3rd Party Libraries universal framework for distributed computing Ray Ecosystem 3

Slide 4

Slide 4 text

Ray Serve TL;DR Flexible, scalable compute for model serving 1. Scalable 2. Low latency 3. Efficient First-class support for multi-model serving Python-native: mix business logic & ML 4

Slide 5

Slide 5 text

Multi-model Serving Pattern: multiple models making up a single application 5 Standing Cat

Slide 6

Slide 6 text

Write a unified Python program Use your favorite tools & libraries Scale across CPUs and GPUs 6 Multi-model Serving

Slide 7

Slide 7 text

Feedback from the Community Multi-model serving is a big need and key strength 💸💸 💸 ML inference is expensive! Efficiency is key. We need better support & documentation for CI/CD ● Emerging pattern: continual learning 7

Slide 8

Slide 8 text

In-progress for Ray 2.0 Double down on multi-model: Deployment Graph API 1. REST API & improved Kubernetes support 2. Integrations with best-in-breed MLOps tooling Seamless interoperability with Ray AIR Hear from Jiao later today! 8 Hear from Shreyas later today! 🤩🤩 🤩

Slide 9

Slide 9 text

Extended Roadmap ● Scale-to-zero ● gRPC support ● Model multiplexing (100s-1000s of small models) ● Shared memory for model weights ● … 9 We want to hear from you!

Slide 10

Slide 10 text

● Join the community ○ discuss.ray.io ○ github.com/ray-project/ray ○ @raydistributed and @anyscalecompute ● Fill out our survey (QR code) for: ○ Feedback to help shape the future of Ray Serve ○ One-on-one sessions with developers ○ Updates about upcoming features Please get in touch 10