pytorch:2.12.0-ubuntu24.04-cuda13-py312 pytorch:2.12.0-ubuntu22.04-atom-py313 pytorch:2.12.0-ubuntu22.04-... Current Status L Manual client-side selection (exact, partial, ...?) L No standardized tag namings across different vendors L Too long tags for multi- feature-compatible images
List [stable in v1.35] & DRAListTypeAttributes [alpha in v1.36] Consumed by CEL expressions to generalize matching conditions Specifies alternative combinations of device properties. Updated DRAListTypeAttributes introduces typed values (bools, ints, strings, versions). Current Status L Looks promising, but complexity still left to users (Manual CEL expression writing...)
given workload Need to support multi-node jobs (aka gang scheduling) • Solution: full automation no user intervention at cluster scale Per-node variant provider + Per-workload variant labels + Variant matcher within the scheduler Backend.AI Sokovan Scheduler Workload's variant spec is populated from container image labels (scanned & cached in prior) https://github.com/lablup/backend.ai