Amazon SageMaker AI launches multi-turn reinforcement learning for AI agent model customization

Published

June 3, 2026

Amazon SageMaker AI Multi-turn Reinforcement Learning

Amazon SageMaker AI now offers multi-turn reinforcement learning (RL), a new serverless model customization technique for fine-tuning models on multi-step, agentic tasks. This feature allows you to adapt foundation models using techniques such as supervised fine-tuning, reinforcement learning from verifiable rewards (RLVR), and reinforcement learning from AI feedback (RLAIF). Multi-turn RL trains models against your own agent environment, rewarding the full sequence of decisions an agent makes across a task, helping you specialize smaller, lower-cost models to match or exceed the task accuracy of larger general-purpose models on your target workload.

SageMaker's Multi-turn RL offering handles the full training loop, from rollout orchestration and trajectory collection to training and checkpoint management. It runs as a fully serverless capability, so you pay only for the tokens processed, with no infrastructure to provision or manage. You can connect your agent running on Amazon Bedrock AgentCore Runtime for fully managed hosting, or on Amazon EKS, Amazon EC2, AWS Fargate, or any infrastructure using the framework of your choice.

What to do

Visit the Amazon SageMaker AI documentation to get started with multi-turn reinforcement learning.

Source: AWS release notes

If you need further guidance on AWS, our experts are available at AWS@westloop.io. You may also reach us by submitting the Contact Us form.

Amazon SageMaker AI launches multi-turn reinforcement learning for AI agent model customization

Amazon SageMaker AI Multi-turn Reinforcement Learning

What to do

Follow our blog

Related posts

Amazon Bedrock Data Automation supports 10 additional languages for speech analytics

Amazon Kinesis Data Streams launches On-demand Advantage mode

Announcing New EC2 R8a Memory-Optimized Instances

Email

Phone

Office