Amazon SageMaker HyperPod now supports EFA-only network interfaces

Amazon SageMaker HyperPod EFA-only Network Interfaces
Amazon SageMaker HyperPod now supports EFA-only network interfaces for cluster instance groups, allowing you to configure dedicated Elastic Fabric Adapter (EFA) devices without the traditional Elastic Network Adapter (ENA) for IP networking. This enables scaling AI/ML clusters further without risking IP address exhaustion in your VPC.
When running large-scale distributed training workloads, inter-node communication bandwidth is critical to training performance. With EFA-only, you can maximize the number of EFA interfaces dedicated to low-latency, high-throughput inter-node communication without requiring IP addresses, thus avoiding IP exhaustion.
What to do
- Specify efa-only in the
ClusterNetworkInterfaceconfiguration when creating or updating your HyperPod cluster via theCreateCluster/UpdateClusterAPI.
Source: AWS release notes
If you need further guidance on AWS, our experts are available at AWS@westloop.io. You may also reach us by submitting the Contact Us form.



