Removed requests-per-minute (RPM) quotas: Amazon Bedrock no longer enforces requests-per-minute (RPM) quotas on the bedrock-runtime or bedrock-mantle endpoints. Throttling is now governed by token-based quotas. RPM rows have been removed from the Amazon Be

Quotas for Amazon Bedrock
Your AWS account has default quotas for Amazon Bedrock. To view service quotas, follow the steps at Viewing service quotas and select Amazon Bedrock as the service. Refer to the Amazon Bedrock service quotas in the AWS General Reference.
Model inference in Amazon Bedrock is controlled by quotas on token usage. Some models use tokens at a higher rate. For more information, see How tokens are counted in Amazon Bedrock.
Amazon Bedrock offers two inference endpoints – bedrock-runtime and bedrock-mantle – each with its own per-model quota allocations. Traffic to the two endpoints is tracked against separate quotas. For details, see Quotas for the bedrock-runtime endpoint and Quotas for the bedrock-mantle endpoint.
The default quotas assigned to an account might be updated depending on regional factors, payment history, fraudulent usage, and/or approval of a quota increase request.
What to do
- Monitor your token usage by counting tokens before running inference.
- Review quotas for the
bedrock-runtimeandbedrock-mantleendpoints. - Request an increase for Amazon Bedrock quotas if necessary.
Source: AWS release notes
If you need further guidance on AWS, our experts are available at AWS@westloop.io. You may also reach us by submitting the Contact Us form.



