Ray Serve: Overview and future roadmap

Overview and Roadmap Edward Oakes [email protected]

Outline • Ray Serve Introduction • Feedback from the Community
• Plans for Ray 2.0 and beyond ◦ Preview for other talks today! 2

Native Libraries 3rd Party Libraries universal framework for distributed computing
Ray Ecosystem 3

Ray Serve TL;DR Flexible, scalable compute for model serving 1.
Scalable 2. Low latency 3. Efficient First-class support for multi-model serving Python-native: mix business logic & ML 4

Multi-model Serving Pattern: multiple models making up a single application
5 Standing Cat

Write a unified Python program Use your favorite tools &
libraries Scale across CPUs and GPUs 6 Multi-model Serving

Feedback from the Community Multi-model serving is a big need
and key strength 💸💸 💸 ML inference is expensive! Efficiency is key. We need better support & documentation for CI/CD • Emerging pattern: continual learning 7

In-progress for Ray 2.0 Double down on multi-model: Deployment Graph
API 1. REST API & improved Kubernetes support 2. Integrations with best-in-breed MLOps tooling Seamless interoperability with Ray AIR Hear from Jiao later today! 8 Hear from Shreyas later today! 🤩🤩 🤩

Extended Roadmap • Scale-to-zero • gRPC support • Model multiplexing
(100s-1000s of small models) • Shared memory for model weights • … 9 We want to hear from you!

• Join the community ◦ discuss.ray.io ◦ github.com/ray-project/ray ◦ @raydistributed
and @anyscalecompute • Fill out our survey (QR code) for: ◦ Feedback to help shape the future of Ray Serve ◦ One-on-one sessions with developers ◦ Updates about upcoming features Please get in touch 10

Ray Serve: Overview and future roadmap

Ray Serve: Overview and future roadmap

Anyscale

More Decks by Anyscale

Other Decks in Technology

Featured

Transcript

Overview and Roadmap Edward Oakes [email protected]

Outline • Ray Serve Introduction • Feedback from the Community

Native Libraries 3rd Party Libraries universal framework for distributed computing

Ray Serve TL;DR Flexible, scalable compute for model serving 1.

Multi-model Serving Pattern: multiple models making up a single application

Write a unified Python program Use your favorite tools &

Feedback from the Community Multi-model serving is a big need

In-progress for Ray 2.0 Double down on multi-model: Deployment Graph

Extended Roadmap • Scale-to-zero • gRPC support • Model multiplexing

• Join the community ◦ discuss.ray.io ◦ github.com/ray-project/ray ◦ @raydistributed