AWS adds support for NIXL with EFA to accelerate LLM inference at scale

AWS Support for NIXL with EFA
AWS now supports the NVIDIA Inference Xfer Library (NIXL) with Elastic Fabric Adapter (EFA) to accelerate disaggregated large language model (LLM) inference on Amazon EC2. This integration enhances performance through:
- Increased KV-cache throughput
- Reduced inter-token latency
- Optimized KV-cache memory utilization
NIXL with EFA enables high throughput, low-latency KV-cache transfer between nodes and efficient KV-cache movement between storage layers. It is interoperable with all EFA-enabled EC2 instances and integrates with frameworks like NVIDIA Dynamo, SGLang, and vLLM.
What to do
- Upgrade to NIXL version 1.0.0 or higher
- Use EFA installer version 1.47.0 or higher
- Deploy on EFA-enabled EC2 instances in all AWS regions
Source: AWS release notes
If you need further guidance on AWS, our experts are available at AWS@westloop.io. You may also reach us by submitting the Contact Us form.



