New feature: Added new reinforcement fine-tuning to improve foundation model performance through feedback signals.

Published

December 3, 2025

Amazon Bedrock: Reinforcement Fine-Tuning

Reinforcement fine-tuning is a model customization technique in Amazon Bedrock that improves foundation model performance by teaching models what constitutes a "good" response through feedback signals called rewards. This approach allows models to improve iteratively based on reward signals, making advanced model customization more accessible and cost-effective.

Approaches

Reinforcement Learning with Verifiable Rewards (RLVR) - Uses rule-based graders for objective tasks like code generation or math reasoning.
Reinforcement Learning from AI Feedback (RLAIF) - Uses AI-based judges for subjective tasks like instruction following or content moderation.

Benefits

Improved model performance - Enhances model accuracy over base models, enabling optimization for price and performance.
Flexible training data - Amazon Bedrock automates much of the complexity, making reinforcement fine-tuning accessible to developers.
Security and compliance - Your proprietary data never leaves AWS secure, governed environment during the customization process.

Supported models for reinforcement fine-tuning

Amazon Nova 2 Lite - Model ID: amazon.nova-lite-v2:0:256k, Single-region model support: us-east-1

How reinforcement fine-tuning works

Stage 1: Response generation - The actor model generates responses to prompts from your training dataset.
Stage 2: Reward computation - Actor model generated prompt-response pairs are evaluated by your selected optimizing models.
Stage 3: Actor model training - Amazon Bedrock uses the prompt-response pairs with scores to train the actor model through policy-based learning.

Source: AWS release notes

If you need further guidance on AWS, our experts are available at AWS@westloop.io. You may also reach us by submitting the Contact Us form.