AWS adds support for NIXL with EFA to accelerate LLM inference at scale

Published

March 19, 2026

AWS Support for NIXL with EFA

AWS now supports the NVIDIA Inference Xfer Library (NIXL) with Elastic Fabric Adapter (EFA) to accelerate disaggregated large language model (LLM) inference on Amazon EC2. This integration enhances performance through:

Increased KV-cache throughput
Reduced inter-token latency
Optimized KV-cache memory utilization

NIXL with EFA enables high throughput, low-latency KV-cache transfer between nodes and efficient KV-cache movement between storage layers. It is interoperable with all EFA-enabled EC2 instances and integrates with frameworks like NVIDIA Dynamo, SGLang, and vLLM.