Model Training Model Fine-Tuning Model Serving Application Server Model Registry Metadata Model Artifacts Offline Feature Store Feature Store Online Feature Store Data Producers Data Products Feature Generation Feature Extraction loop Pipelines
Model Training Model Fine-Tuning Model Serving Application Server Model Registry Metadata Model Artifacts Offline Feature Store Feature Store Online Feature Store Data Producers Data Products Feature Generation Feature Extraction loop Pipelines Computing Platform Data Platform
Special Interest Groups (SIG Autoscaling etc.. • Working Groups (WG Batch etc.. WG Batch (with Device Management) WG Device Management WG Serving 2021 2022 2023 2024
Pod Pod Pod Pod Group model • Pod Scheduling • Container Lifecycle Management Workload α Workload β Workload γ • Workload Orchestration • Workload Scheduling • Quota Management Quota A Quota B Quota C • Device Management (DRA) • ML Model / cache Placement-Aware Routing • LLM token-size based Load Balancing
WG Serving • Traffic Scheduling Pod Pod Pod Pod Group model • Pod Scheduling • Container Lifecycle Management WG Batch Workload α Workload β Workload γ • Workload Orchestration • Workload Scheduling • Quota Management WG Serving WG Batch Quota A Quota B Quota C • Device Management (DRA)
(DRA) • Traffic Scheduling Pod Pod Pod Pod Group model • Pod Scheduling • Container Lifecycle Management Workload α Workload β Workload γ • Workload Orchestration • Workload Scheduling • Quota Management Quota A Quota B Quota C