[Webinar] An introduction to Ray for scaling machine learning (ML) workloads

Slide 1

Slide 1 text

Introduction to Ray for scaling machine learning Robert Nishihara Co-founder, Anyscale and co-creator of Ray Bill Chambers Product lead, Anyscale

Slide 2

Slide 2 text

- Machine learning is pervasive in every domain - Distributed machine learning is becoming a necessity - Distributed computing is notoriously hard Why Ray?

Slide 3

Slide 3 text

- Machine learning is pervasive in every domain - Distributed machine learning is becoming a necessity - Distributed computing is notoriously hard Why Ray?

Slide 4

Slide 4 text

Apps increasingly incorporate AI/ML

Slide 5

Slide 5 text

- Machine learning is pervasive in every domain - Distributed machine learning is becoming a necessity - Distributed computing is notoriously hard Why Ray?

Slide 6

Slide 6 text

35x every 18 m onths 2020 GPT-3 Compute demand growing faster than supply Moore’s Law (2x every 18 months) CPU https://openai.com/blog/ai-and-compute/

Slide 7

Slide 7 text

35x every 18 m onths 2020 GPT-3 Specialized hardware is also not enough Moore’s Law (2x every 18 months) CPU https://openai.com/blog/ai-and-compute/ GPU* TPU *

Slide 8

Slide 8 text

35x every 18 m onths 2020 GPT-3 Specialized hardware is also not enough Moore’s Law (2x every 18 months) CPU https://openai.com/blog/ai-and-compute/ GPU* TPU * No way out but to distribute!

Slide 9

Slide 9 text

- Machine learning is pervasive in every domain - Distributed machine learning is becoming a necessity - Distributed computing is notoriously hard Why Ray?

Slide 10

Slide 10 text

Generality Ease of development Existing solutions have may tradeoffs

Slide 11

Slide 11 text

Generality Ease of development Existing solutions have may tradeoffs

Slide 12

Slide 12 text

Existing solutions have may tradeoffs Generality Ease of development

Slide 13

Slide 13 text

Existing solutions have may tradeoffs Generality Ease of development

Slide 14

Slide 14 text

- Machine learning is pervasive in every domain - Distributed machine learning is becoming a necessity - Distributed computing is notoriously hard Ray’s vision: Make distributed computing accessible to every developer Why Ray?

Slide 15

Slide 15 text

The Ray Ecosystem

Slide 16

Slide 16 text

Rich ecosystem for scaling ML workloads Native libraries - easily scale common bottlenecks in ML workflows - Examples: Ray Tune for HPO, RLlib for RLlib, Ray Serve for Serving, etc. Integrations - scale popular frameworks with Ray with minimal changes - Examples: XGBoost, TF, Jax, PyTorch etc.

Slide 17

Slide 17 text

Rich ecosystem for scaling ML workloads Ray Core / Datasets Model Serving Data Processing Training Serving Ray Core + Datasets Reinforcement Learning Hyper. Tuning ** a small subset of the Ray ecosystem in ML

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Challenges in scaling hyperparameter tuning? Rich ecosystem for scaling ML workloads Ray Core / Datasets Model Serving Data Processing Training Serving Ray Core + Datasets Reinforcement Learning Hyper. Tuning

Slide 20

Slide 20 text

Rich ecosystem for scaling ML workloads Ray Core / Datasets Model Serving Data Processing Training Serving Ray Core + Datasets Reinforcement Learning Hyper. Tuning Integrate Ray Tune! No need to adopt entire Ray framework.

Slide 21

Slide 21 text

Generality Ease of development Stitching together different frameworks to go end-to-end?

Slide 22

Slide 22 text

Rich ecosystem for scaling ML workloads Ray Core / Datasets Model Serving Data Processing Training Serving Ray Core + Datasets Reinforcement Learning Hyper. Tuning Unified, distributed toolkit to go end-to-end

Slide 23

Slide 23 text

Companies scaling ML with Ray

Slide 24

Slide 24 text

Ray Core / Datasets Model Serving Data Processing Training Serving Reinforcement Learning Hyper. Tuning Companies scaling ML with Ray

Slide 25

Slide 25 text

Scaling Ecosystem Restoration Dendra Systems

Slide 26

Slide 26 text

Making Boats Fly with AI Mckinsey | QuantumBlack Australia

Slide 27

Slide 27 text

Large Scale ML Platforms Uber, Shopify, Robinhood, and more

Slide 28

Slide 28 text

Demo

Slide 29

Slide 29 text

Starting scaling your ML workloads Getting Started: Documentation (docs.ray.io) Quick start example, reference guides, etc Forums (discuss.ray.io) Learn / share with broader Ray community, including core team Ray Slack Connect with the Ray team and community

Slide 30

Slide 30 text

Thank you