Existing production machine learning systems often suffer from various problems that make them hard to use. For example, data scientists and ML practitioners often spend most of their time-fighting YAMLs and refactoring code to push models to production.
To address this, the Ray community has built Ray AI Runtime (AIR), an open-source toolkit for building large-scale end-to-end ML applications. By leveraging Ray’s distributed compute strata and library ecosystem, the AIR Runtime brings scalability and programmability to ML platforms.
The main focus of the Ray AI Runtime is on providing the compute layer for Python-based ML workloads and is designed to interoperate with other systems for storage and metadata needs.
In this session, we’ll explore and discuss the following:
* How AIR is different from existing ML platform tools like TFX, Sagemaker, and Kubeflow
* How AIR allows you to program and scale your machine learning workloads easily
* Interoperability and easy integration points with other systems for storage and metadata needs
* AIR’s cutting-edge features for accelerating the machine learning lifecycle such as data preprocessing, last-mile data ingestion, tuning and training, and serving at scale
Key takeaways for attendees are:
* Understand how Ray AI Runtime can be used to implement scalable, programmable machine learning workflows.
* Learn how to pass and share data across distributed trainers and Ray native libraries: Tune, Serve, Train, RLlib, etc.
* How to scale python-based workloads across supported public clouds