Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ray Community Meetup Jan 25, 2023: Ray OOM Monitor

Anyscale
January 27, 2023

Ray Community Meetup Jan 25, 2023: Ray OOM Monitor

We are delighted to kick off New Year with our first January Ray Meetup with talks from Ray community users and committers. Join us to hear from the Ray team at Anyscale and Shopify about Ray and its usage.

Agenda:
5:00 p.m. Welcome remarks, Year 2022: Ray in Review & upcoming announcements - Jules Damji, Anyscale
Talk 1 (35-40 mins): Monitor & prevent out-of-memory problems with Ray OOM monitor - Clarence Ng, Anyscale
Q & A (10 mins)
Talk 2 (35-40 mins): How Shopify used Ray<>Tensorflow to build a Product Hierarchical Categorization model to auto classify billions of products using NLP and Computer Vision, Kshetrajna Raghavan, Shopify

Anyscale

January 27, 2023
Tweet

More Decks by Anyscale

Other Decks in Programming

Transcript

  1. Agenda Introduction Problem with the existing solution How does Ray

    memory monitor work Deep dive into preemption policy Demo
  2. OS (Linux) OOM killer → Triggers when the system runs

    of free memory pages → Kills the most memory-hungry task → Frequency is capped : stalls other processes → Ray: sets process priority for the tasks
  3. Ray memory monitor Embedded memory monitor Worker processes Task Operating

    System Task Actor Actor Raylet Get resources usage Process stats
  4. Ray memory monitor Embedded memory monitor Operating System Raylet Using

    too much Memory? Worker processes Task Task Actor Actor
  5. Preemption policy Requirements: If the application cannot complete we should

    surface that information to simplify debugging and path to resolution The application will finish even when it tries to overload the cluster → It should finish in a reasonable amount of time → workload shouldn’t hang
  6. Preemption policy (Ray 2.2) → Prefer killing retriable task →

    Prefer killing newest task → Limited retry : could deadlock otherwise
  7. Newest executed task Task Start time of execution = 14:38

    PM Task Start time of execution = 14:22 PM OOM kill
  8. Preemption policy (Ray 2.3) → Group tasks that have the

    same parent if it is retriable → Preempt retriable groups → Preempt largest group → Preempt newest task within the group → Always retry task unless the task is the last member of the group
  9. Summary • Ray memory monitor improves cluster stability • Latest

    release of Ray (2.2) ◦ preemptively kills task to prevent the node from failing ◦ Improved observability for debugging memory issues • Next release of Ray (2.3) ◦ Detects when a workload gets stuck and reports the error ◦ Fairness across tasks to avoid starvation
  10. How to use this template? Please DO NOT edit this

    master template. If you want to use these styles in your presentations, please create a copy of this template before you edit. OR copy slides from here into your deck. When creating a copy, please change the location of the copy to your My Drive or another location to avoid cluttering this central folder.
  11. Here is a basic information page - Light Lorem ipsum

    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Lorem ipsum Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Lorem ipsum Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
  12. Here is a basic information page Lorem ipsum Lorem ipsum

    dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Lorem ipsum Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Lorem ipsum Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
  13. Slide with 2 points Lorem ipsum dolor sit amet, consectetur

    adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Slide with 2 points Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
  14. Image title IMAGE SUBTITLE Lorem ipsum dolor sit amet, consectetur

    adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
  15. IMAGE TITLE IMAGE SUBTITLE Lorem ipsum dolor sit amet, consectetur

    adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
  16. Slide title Column title Column title Column title Lorem ipsum

    dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.Lorem ipsum dolor sit amet, Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.Lorem ipsum dolor sit amet, Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.Lorem ipsum dolor sit amet,