Removed requests-per-minute (RPM) quotas: Amazon Bedrock no longer enforces requests-per-minute (RPM) quotas on the bedrock-runtime or bedrock-mantle endpoints. Throttling is now governed by token-based quotas. RPM rows have been removed from the Amazon Be

Published
May 27, 2026
https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html

Quotas for Amazon Bedrock

Your AWS account has default quotas for Amazon Bedrock. To view service quotas, follow the steps at Viewing service quotas and select Amazon Bedrock as the service. Refer to the Amazon Bedrock service quotas in the AWS General Reference.

Model inference in Amazon Bedrock is controlled by quotas on token usage. Some models use tokens at a higher rate. For more information, see How tokens are counted in Amazon Bedrock.

Amazon Bedrock offers two inference endpoints – bedrock-runtime and bedrock-mantle – each with its own per-model quota allocations. Traffic to the two endpoints is tracked against separate quotas. For details, see Quotas for the bedrock-runtime endpoint and Quotas for the bedrock-mantle endpoint.

The default quotas assigned to an account might be updated depending on regional factors, payment history, fraudulent usage, and/or approval of a quota increase request.

What to do

  • Monitor your token usage by counting tokens before running inference.
  • Review quotas for the bedrock-runtime and bedrock-mantle endpoints.
  • Request an increase for Amazon Bedrock quotas if necessary.

Source: AWS release notes




If you need further guidance on AWS, our experts are available at AWS@westloop.io. You may also reach us by submitting the Contact Us form.

Follow our blog

Get the latest insights and advice on AWS services from our experts.

By clicking Sign Up you're confirming that you agree with our Terms and Conditions.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.