Amazon SageMaker AI cuts generative AI inference scale-out time by up to half with automatic container image caching

Published

June 30, 2026

Amazon SageMaker Inference Enhancements

Amazon SageMaker Inference now supports container image caching, enabling up to 2x faster end-to-end scaling for generative AI models during scale-out events. This feature pre-caches your container image so new instances can start serving traffic faster, without waiting for large container images to be pulled from Amazon ECR.

What to do

No changes are required from customers. The service automatically caches the image URI specified in your endpoint or inference component configuration.

New Features:
Container image caching for up to 2x faster scaling on new instances.
Comprehensive scaling optimization suite for generative AI.

Container image caching is available in all AWS commercial regions where SageMaker Inference is supported. To learn more, visit the launch blog.

Source: AWS release notes

If you need further guidance on AWS, our experts are available at AWS@westloop.io. You may also reach us by submitting the Contact Us form.

Amazon SageMaker AI cuts generative AI inference scale-out time by up to half with automatic container image caching

Amazon SageMaker Inference Enhancements

What to do

Follow our blog

Related posts

Second-generation Amazon FSx for NetApp ONTAP is now available in four additional AWS Regions

Introducing self-service lifecycle management capabilities for AWS Outposts

Amazon EC2 R8i and R8i-flex instances are now available in Europe (London)

Email

Phone

Office