SageMaker HyperPod now supports Managed tiered KV cache and intelligent routing

Published

November 26, 2025

Amazon SageMaker HyperPod Updates

Amazon SageMaker HyperPod now supports Managed Tiered KV Cache and Intelligent Routing for large language model (LLM) inference, optimizing performance for long-context prompts and multi-turn conversations.

Managed Tiered KV Cache intelligently caches and reuses computed values, while Intelligent Routing directs requests to optimal instances, delivering:

40% latency reduction
25% throughput improvement
25% cost savings

Managed Tiered KV Cache uses a two-tier architecture combining local CPU memory (L1) with disaggregated cluster-wide storage (L2). Intelligent Routing maximizes cache utilization through: