New feature: Added new reinforcement fine-tuning to improve foundation model performance through feedback signals.

Amazon Bedrock: Reinforcement Fine-Tuning
Reinforcement fine-tuning is a model customization technique in Amazon Bedrock that improves foundation model performance by teaching models what constitutes a "good" response through feedback signals called rewards. This approach allows models to improve iteratively based on reward signals, making advanced model customization more accessible and cost-effective.
Approaches
- Reinforcement Learning with Verifiable Rewards (RLVR) - Uses rule-based graders for objective tasks like code generation or math reasoning.
- Reinforcement Learning from AI Feedback (RLAIF) - Uses AI-based judges for subjective tasks like instruction following or content moderation.
Benefits
- Improved model performance - Enhances model accuracy over base models, enabling optimization for price and performance.
- Flexible training data - Amazon Bedrock automates much of the complexity, making reinforcement fine-tuning accessible to developers.
- Security and compliance - Your proprietary data never leaves AWS secure, governed environment during the customization process.
Supported models for reinforcement fine-tuning
- Amazon Nova 2 Lite - Model ID: amazon.nova-lite-v2:0:256k, Single-region model support: us-east-1
How reinforcement fine-tuning works
- Stage 1: Response generation - The actor model generates responses to prompts from your training dataset.
- Stage 2: Reward computation - Actor model generated prompt-response pairs are evaluated by your selected optimizing models.
- Stage 3: Actor model training - Amazon Bedrock uses the prompt-response pairs with scores to train the actor model through policy-based learning.
Source: AWS release notes
If you need further guidance on AWS, our experts are available at AWS@westloop.io. You may also reach us by submitting the Contact Us form.



