arXiv.org (https://arxiv.org/abs/2602.21548)
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference (https://arxiv.org/abs/2602.21548)
The performance of multi-turn, agentic LLM inference is increasingly dominated by KV-Cache storage I/O rather than computation. In prevalent disaggregated architectures, loading the massive KV-Cache from external storage creates a fundamental...