CacheRoute is an innovative LLM scheduling scheme dedicated to enabling flexible KV cache reuse across LLM systems, improving task performance and system efficiency.
-
Updated
Jul 2, 2026 - Python
CacheRoute is an innovative LLM scheduling scheme dedicated to enabling flexible KV cache reuse across LLM systems, improving task performance and system efficiency.
Multimodal LLM inference gateway with KV-cache-aware routing and LMCache offload. OpenAI-compatible, benchmarked on GPUs.
Benchmarking LMCache under simulated RTT
Add a description, image, and links to the lmcache topic page so that developers can more easily learn about it.
To associate your repository with the lmcache topic, visit your repo's landing page and select "manage topics."