Machine Learning offline and online serving users leverage Ray in radically different ways, which makes it challenging to provide a common architecture to inspect and debug performance problems.
In this talk, we discuss Ray 2. x's current observability architecture, including how to view metrics and logs and inspect the state of tasks, actors, and other resources in Ray. We discuss features like the newly revamped ray dashboard and the added ray metrics and present a roadmap for where Ray observability is going in the future with a unified observability data model.
This talk is for you if you're interested in the following:
* What’s new with ray metrics and dashboard revamp
* How to debug Ray programs using the CLI, Dashboard, and logs
* Learn how to implement observability in a general-purpose distributed system such as Ray.