Pod availability index KVCache Engine ( LMCache ) Redis Score (prompt, ModelName, relevantPods) FindLongestTokenizedPrefix (prompt, ModelName) -> tokens DigestPromptAsync GetPodsForKeys(tokens) -> {KVBlock keys to Pods} availability map Route Redis MGet(blockkeys) -> {KVBlock keys to Pods} Connector API UpdateIndex (KVBlock keys, IP) vLLM Node KVCache Manager {Pod to Scores map} ref: https://github.com/llm-d/llm-d-kv-cache-manager llm-d-kv-cache-manager overview 6 1 6 1