Amazon ECS Managed Instances now supports NVIDIA GPU metrics

Amazon ECS GPU Metrics Update
Amazon Elastic Container Service (Amazon ECS) now offers NVIDIA GPU metrics for containerized workloads running on Amazon ECS Managed Instances. These metrics are available through Amazon CloudWatch Container Insights with enhanced observability, providing visibility into GPU health and performance to help troubleshoot and optimize GPU-accelerated workloads on Amazon ECS.
New Features
- Monitor GPU capacity, utilization, memory, hardware health, and thermal conditions directly in CloudWatch.
- Granular visibility into these metrics, including at the GPU device level.
- Enhanced observability for GPU operational and hardware health across the Amazon ECS Managed Instances fleet.
What to do
- Enable Container Insights with enhanced observability on your Amazon ECS cluster.
- Launch GPU-accelerated Amazon EC2 instance types through an Amazon ECS Managed Instances capacity provider.
Source: AWS release notes
If you need further guidance on AWS, our experts are available at AWS@westloop.io. You may also reach us by submitting the Contact Us form.


