Amazon SageMaker AI Announces New observability capability For Inference Endpoints

Published

June 18, 2026

Amazon SageMaker AI Observability Capability

Amazon SageMaker AI's new observability capability provides comprehensive visibility into token performance, GPU health, inference component placement, and autoscaling behavior for production generative AI inference workloads. This capability tracks real-time metrics including Time to First Token, inter-token latency, queue depth, and tokens per second, and surfaces them alongside infrastructure health to help customers identify and resolve issues quickly.

The new pre-built SageMaker AI Insights dashboard in Amazon CloudWatch offers token latency, GPU utilization, inference component copy counts, scaling events, and cold start breakdowns in a single view. Customers can connect directly using the regional PromQL endpoint and import a pre-configured dashboard template if they use observability tools like Grafana.