[Ray Summit 2022] Achieving Scalability and Interactivity with Ray Serve

Achieving Scalability and Interactivity with Ray Serve Jaehyun Sim, Amar
Shah Ikigai Labs

Who We Are Vinayak Ramesh CEO/Co-founder Devavrat Shah CTO/Co-founder Jaehyun
Sim Engineering Manager Robbie Jung Software Engineer James Oh Product Manager Aldreen Venzon Customer Success Amar Shah Cloud Architect Robert Xin Data Engineer Saša Mršić UI/UX Engineer

The Ikigai Platform: AI-Charged Spreadsheets Ikigai Labs offers a next-generational
data processing and analytics tool that empowers data operators with AI and BI Connectors Datasets Flows Dashboards Automation Human in the Loop

Outline • What Is The Ikigai Platform Trying To Achieve?
• Why Ray? • How Does Ray Serve Resolve The Challenges We Faced? • How Do We Scale Ray Serve? • How Can Ray Client Take Us Further?

What Is The Ikigai Platform Trying To Achieve?

Mission-Critical Data Pipelining “We want to collect all attached PDF
ﬁles from emails we received today and use them as source data”

Highly Interactive Data Pipelining

Instantly Browsable Data Pipelining

Why Ray?

Earning Scalability with Minimal Code Change to

How Does Ray Serve Resolve The Challenges We Faced?

Introducing Ray Serve Ray Serve is “easy-to-use” and “scalable” model
serve library built on Ray. • Framework-agnostic Can serve not only DL models but also “arbitrary Python business logic” • Python-ﬁrst Conﬁgure model serving with “pure Python code”

Challenge: Conﬂicting Python Libraries numpy pandas nltk boto3==1.1.0 numpy pandas
beautifulsoup4 boto3==1.16.0 numpy pandas scipy boto3==1.7.0

Deﬁning Custom Python Scripts with Ray Serve

Dependency Management with Ray Serve numpy pandas nltk boto3==1.1.0 numpy
pandas beautifulsoup4 boto3==1.16.0 numpy pandas scipy boto3==1.7.0

Challenge: Task Overhead for ‘Peek’ Ikigai Platform Supports two different
types of Data Pipelining 1. Scalable Pipeline Run - To run large volumes of data at scale 2. ‘Peek’ Run - To run samples of data quickly for sub-second data interfacing Problem: Task submission with Ray usually takes about 10 seconds If cuﬆom Python scripts are on Ray serve, how could we maintain sub-second ‘peek’ time?

Dynamic Function Invocation with Ray Serve Scalable Pipeline Run (Invoke
inside of Ray Cluster) ‘Peek’ Run (Invoke from outside of Ray Cluster) No task overhead from Task submission

Generating HTTP Endpoints for ‘Peek’ to

Ikigai with Ray Serve, on Kubernetes Ray Serve Manager ➔
Create/Update Ray Serve Deployment ➔ Manages Conda environment for all Ray worker pods ➔ Optimize the number of Ray Serve instance, based on dependency requirements

How Do We Scale Ray Serve?

Scaling Ray Serve on Kubernetes Cluster Easy Autoscaler Deployment ➔
`ray up cluster.yaml` ➔ Easy bootstrapping with Ray Cluster Launcher ➔ The Ray Cluster Launcher will automatically enable a load-based autoscaler

Challenge: Conda Env with Autoscaler Missing conda env when autoscaler
adds a new node

Deﬁning Conda Env in Deployments

Challenge: Skewed Traﬃc on Serve Instances

Scaling a Speciﬁc Serve Instance Individually

Challenge: Deployment Race Condition

Explicit Deployment Versioning This will fail if the existing version
in the system is not equal to `old_v` This will fail if the existing version in the system is not equal to `old_v`

Ikigai’s Ray Serve on Kubernetes Cluster

How Can Ray Client Take Us Further?

Even More Flexibility with Ray Client Ray now offers a
remote client to connect to Ray cluster • More Natural to Dynamic Task Definitions No need to generate Python file for different task definitions • Full Ray API Support Makes the migration extremely easy

Data Pipelining without Task Submission Extremely minimal changes to utilize
full Ray API

Ikigai with Ray Serve and Client All microservices and Python
workers have access to the Ray Core and Serve

Thank You! Special Thanks to Edward Oakes @ Anyscale

[Ray Summit 2022] Achieving Scalability and Int...

[Ray Summit 2022] Achieving Scalability and Interactivity with Ray Serve

Jae Sim

More Decks by Jae Sim

Other Decks in Technology

Featured

Transcript

Achieving Scalability and Interactivity with Ray Serve Jaehyun Sim, Amar

Who We Are Vinayak Ramesh CEO/Co-founder Devavrat Shah CTO/Co-founder Jaehyun

The Ikigai Platform: AI-Charged Spreadsheets Ikigai Labs offers a next-generational

Outline • What Is The Ikigai Platform Trying To Achieve?

What Is The Ikigai Platform Trying To Achieve?

Mission-Critical Data Pipelining “We want to collect all attached PDF

Highly Interactive Data Pipelining

Instantly Browsable Data Pipelining

Why Ray?

Earning Scalability with Minimal Code Change to

How Does Ray Serve Resolve The Challenges We Faced?

Introducing Ray Serve Ray Serve is “easy-to-use” and “scalable” model

Challenge: Conﬂicting Python Libraries numpy pandas nltk boto3==1.1.0 numpy pandas

Deﬁning Custom Python Scripts with Ray Serve

Dependency Management with Ray Serve numpy pandas nltk boto3==1.1.0 numpy

Challenge: Task Overhead for ‘Peek’ Ikigai Platform Supports two different

Dynamic Function Invocation with Ray Serve Scalable Pipeline Run (Invoke

Generating HTTP Endpoints for ‘Peek’ to

Ikigai with Ray Serve, on Kubernetes Ray Serve Manager ➔

How Do We Scale Ray Serve?

Scaling Ray Serve on Kubernetes Cluster Easy Autoscaler Deployment ➔

Challenge: Conda Env with Autoscaler Missing conda env when autoscaler

Deﬁning Conda Env in Deployments

Challenge: Skewed Traﬃc on Serve Instances

Scaling a Speciﬁc Serve Instance Individually

Challenge: Deployment Race Condition

Explicit Deployment Versioning This will fail if the existing version

Ikigai’s Ray Serve on Kubernetes Cluster

How Can Ray Client Take Us Further?

Even More Flexibility with Ray Client Ray now offers a

Data Pipelining without Task Submission Extremely minimal changes to utilize

Ikigai with Ray Serve and Client All microservices and Python

Thank You! Special Thanks to Edward Oakes @ Anyscale