Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Webinar] An introduction to Ray for scaling machine learning (ML) workloads

[Webinar] An introduction to Ray for scaling machine learning (ML) workloads

Modern machine learning (ML) workloads, such as deep learning and large-scale model training, are compute-intensive and require distributed execution. Ray was created in the UC Berkeley RISELab to make it easy for every engineer to scale their applications, without requiring any distributed systems expertise.

Join Robert Nishihara, co-creator of Ray, and Bill Chambers, product lead for Ray, for an introduction to Ray for scaling your ML workloads. Learn how Ray libraries (eg. Ray Tune, Ray Serve, etc) help you easily scale every step of your ML pipeline — from model training and hyperparameter search to production serving.

Highlights include:
* Ray overview & core concepts
* Library ecosystem and use cases
* Demo: Ray for scaling ML workflows
* Getting started resources

Af07bbf978a0989644b039ae6b8904a5?s=128

Anyscale
PRO

August 18, 2021
Tweet

Transcript

  1. Introduction to Ray for scaling machine learning Robert Nishihara Co-founder,

    Anyscale and co-creator of Ray Bill Chambers Product lead, Anyscale
  2. - Machine learning is pervasive in every domain - Distributed

    machine learning is becoming a necessity - Distributed computing is notoriously hard Why Ray?
  3. - Machine learning is pervasive in every domain - Distributed

    machine learning is becoming a necessity - Distributed computing is notoriously hard Why Ray?
  4. Apps increasingly incorporate AI/ML

  5. - Machine learning is pervasive in every domain - Distributed

    machine learning is becoming a necessity - Distributed computing is notoriously hard Why Ray?
  6. 35x every 18 m onths 2020 GPT-3 Compute demand growing

    faster than supply Moore’s Law (2x every 18 months) CPU https://openai.com/blog/ai-and-compute/
  7. 35x every 18 m onths 2020 GPT-3 Specialized hardware is

    also not enough Moore’s Law (2x every 18 months) CPU https://openai.com/blog/ai-and-compute/ GPU* TPU *
  8. 35x every 18 m onths 2020 GPT-3 Specialized hardware is

    also not enough Moore’s Law (2x every 18 months) CPU https://openai.com/blog/ai-and-compute/ GPU* TPU * No way out but to distribute!
  9. - Machine learning is pervasive in every domain - Distributed

    machine learning is becoming a necessity - Distributed computing is notoriously hard Why Ray?
  10. Generality Ease of development Existing solutions have may tradeoffs

  11. Generality Ease of development Existing solutions have may tradeoffs

  12. Existing solutions have may tradeoffs Generality Ease of development

  13. Existing solutions have may tradeoffs Generality Ease of development

  14. - Machine learning is pervasive in every domain - Distributed

    machine learning is becoming a necessity - Distributed computing is notoriously hard Ray’s vision: Make distributed computing accessible to every developer Why Ray?
  15. The Ray Ecosystem

  16. Rich ecosystem for scaling ML workloads Native libraries - easily

    scale common bottlenecks in ML workflows - Examples: Ray Tune for HPO, RLlib for RLlib, Ray Serve for Serving, etc. Integrations - scale popular frameworks with Ray with minimal changes - Examples: XGBoost, TF, Jax, PyTorch etc.
  17. Rich ecosystem for scaling ML workloads Ray Core / Datasets

    Model Serving Data Processing Training Serving Ray Core + Datasets Reinforcement Learning Hyper. Tuning ** a small subset of the Ray ecosystem in ML
  18. Rich ecosystem for scaling ML workloads Ray Core / Datasets

    Model Serving Data Processing Training Serving Ray Core + Datasets Reinforcement Learning Hyper. Tuning ** a small subset of the Ray ecosystem in ML Integrate Ray only based on your needs!
  19. Challenges in scaling hyperparameter tuning? Rich ecosystem for scaling ML

    workloads Ray Core / Datasets Model Serving Data Processing Training Serving Ray Core + Datasets Reinforcement Learning Hyper. Tuning
  20. Rich ecosystem for scaling ML workloads Ray Core / Datasets

    Model Serving Data Processing Training Serving Ray Core + Datasets Reinforcement Learning Hyper. Tuning Integrate Ray Tune! No need to adopt entire Ray framework.
  21. Generality Ease of development Stitching together different frameworks to go

    end-to-end?
  22. Rich ecosystem for scaling ML workloads Ray Core / Datasets

    Model Serving Data Processing Training Serving Ray Core + Datasets Reinforcement Learning Hyper. Tuning Unified, distributed toolkit to go end-to-end
  23. Companies scaling ML with Ray

  24. Ray Core / Datasets Model Serving Data Processing Training Serving

    Reinforcement Learning Hyper. Tuning Companies scaling ML with Ray
  25. Scaling Ecosystem Restoration Dendra Systems

  26. Making Boats Fly with AI Mckinsey | QuantumBlack Australia

  27. Large Scale ML Platforms Uber, Shopify, Robinhood, and more

  28. Demo

  29. Starting scaling your ML workloads Getting Started: Documentation (docs.ray.io) Quick

    start example, reference guides, etc Forums (discuss.ray.io) Learn / share with broader Ray community, including core team Ray Slack Connect with the Ray team and community
  30. Thank you