Amazon SageMaker AI launches multi-turn reinforcement learning for AI agent model customization

Published
June 3, 2026
https://aws.amazon.com/about-aws/whats-new/2026/06/multi-turn-reinforcement-learning-on-sagemaker-ai/

Amazon SageMaker AI Multi-turn Reinforcement Learning

Amazon SageMaker AI now offers multi-turn reinforcement learning (RL), a new serverless model customization technique for fine-tuning models on multi-step, agentic tasks. This feature allows you to adapt foundation models using techniques such as supervised fine-tuning, reinforcement learning from verifiable rewards (RLVR), and reinforcement learning from AI feedback (RLAIF). Multi-turn RL trains models against your own agent environment, rewarding the full sequence of decisions an agent makes across a task, helping you specialize smaller, lower-cost models to match or exceed the task accuracy of larger general-purpose models on your target workload.

SageMaker's Multi-turn RL offering handles the full training loop, from rollout orchestration and trajectory collection to training and checkpoint management. It runs as a fully serverless capability, so you pay only for the tokens processed, with no infrastructure to provision or manage. You can connect your agent running on Amazon Bedrock AgentCore Runtime for fully managed hosting, or on Amazon EKS, Amazon EC2, AWS Fargate, or any infrastructure using the framework of your choice.

What to do

Source: AWS release notes




If you need further guidance on AWS, our experts are available at AWS@westloop.io. You may also reach us by submitting the Contact Us form.

Follow our blog

Get the latest insights and advice on AWS services from our experts.

By clicking Sign Up you're confirming that you agree with our Terms and Conditions.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.