Converse API support for batch inference: You can now use the Converse API format for batch inference input data. When creating a batch inference job, set the model invocation type to Converse to use a consistent request format across models.

Published
February 28, 2026
https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference-create.html

Create a batch inference job

After setting up an Amazon S3 bucket with files for model inference, you can create a batch inference job. Ensure the files are formatted correctly as per the instructions in Format and upload your batch inference data.

What to do

  • Sign in to the AWS Management Console with the appropriate IAM permissions.
  • Navigate to the Amazon Bedrock console and select Batch inference.
  • Choose Create job and fill in the required details:
    • Job name: Provide a name for the job.
    • Model: Select a model for the batch inference job.
    • Model invocation type: Choose the API format for your input data.
    • Input data: Select an S3 location for your batch inference job.
    • Output data: Choose an S3 location to store the output files.
    • Service access: Select an existing service role or create a new one.
  • Optionally, add tags to the job.
  • Choose Create batch inference job.

API Method

To create a batch inference job via API, send a CreateModelInvocationJob request with the required fields:

  • jobName: Specify a name for the job.
  • roleArn: Provide the ARN of the service role with permissions to create and manage the job.
  • modelId: Specify the ID or ARN of the model to use in inference.
  • inputDataConfig: Specify the S3 location containing the input data.
  • outputDataConfig: Specify the S3 location to write the model responses to.
  • Optional fields:
    • modelInvocationType: Specify the API format of the input data.
    • timeoutDurationInHours: Specify the duration in hours after which the job will time out.
    • tags: Specify any tags to associate with the job.
    • vpcConfig: Specify the VPC configuration to use to protect your data during the job.
    • clientRequestToken: Ensure the API request completes only once.

The response returns a jobArn that you can use to refer to the job when carrying out other batch inference-related API calls.

Source: AWS release notes




If you need further guidance on AWS, our experts are available at AWS@westloop.io. You may also reach us by submitting the Contact Us form.

Follow our blog

Get the latest insights and advice on AWS services from our experts.

By clicking Sign Up you're confirming that you agree with our Terms and Conditions.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.