NVIDIA NeMo Agent Toolkit Finetuning Harness for Reinforcement Learning#

Warning

Experimental Feature: The Finetuning Harness is experimental and may change in future releases. Future versions may introduce breaking changes without notice.

The NeMo Agent Toolkit provides a powerful finetuning harness designed for in-situ reinforcement learning of agentic LLM workflows. This enables iterative improvement of agents through experience, allowing models to learn from their interactions with environments, tools, and users.

Overview#

The finetuning harness is built on four foundational principles:

Principle

Description

Decoupled Architecture

Training logic is separated from backends, allowing you to use any RL framework (OpenPipe ART, NeMo Aligner, custom implementations)

In-Situ Training

Train agents with the same workflow you run in production, without moving to a different development environment.

Flexible Targeting

Finetune specific functions or entire workflows, enabling targeted improvements in complex agentic systems.

Composable Components

Three pluggable components (TrajectoryBuilder, TrainerAdapter, Trainer) can be mixed, matched, and customized

Architecture#

┌────────────────────────────────────────────────────────────────────────┐
│                              Trainer                                   │
│  (Orchestrates the finetuning loop across epochs)                      │
│                                                                        │
│  ┌───────────────────────┐         ┌───────────────────────────┐       │
│  │  TrajectoryBuilder    │         │    TrainerAdapter         │       │
│  │                       │         │                           │       │
│  │  - Runs evaluations   │ ──────► │  - Validates trajectories │       │
│  │  - Collects episodes  │         │  - Submits to backend     │       │
│  │  - Computes rewards   │         │  - Monitors training      │       │
│  │  - Groups trajectories│         │  - Reports status         │       │
│  └───────────────────────┘         └───────────────────────────┘       │
└────────────────────────────────────────────────────────────────────────┘
                                         │
                                         ▼
                            ┌─────────────────────────┐
                            │   Remote Training       │
                            │      Backend            │
                            └─────────────────────────┘

Documentation#

Guide

Description

Concepts

Core concepts, RL fundamentals, curriculum learning, and architecture details

Extending

How to implement custom TrajectoryBuilders, TrainerAdapters, and Trainers

OpenPipe ART

Using the OpenPipe ART backend for GRPO training

Supported Backends#

Backend

Plugin Package

Description

OpenPipe ART

nvidia-nat-openpipe-art

GRPO-based training with vLLM and TorchTune

Key Features#

  • Curriculum Learning: Progressively introduce harder examples during training

  • Multi-Generation Trajectories: Collect multiple responses per example for GRPO optimization

  • Validation Monitoring: Periodic evaluation on held-out data to track generalization

  • Progress Visualization: Automatic reward plots and metrics logging

  • Flexible Targeting: Train specific functions or models in complex workflows

Requirements#

  • Training backend (e.g., OpenPipe ART server with GPU)

  • LLM inference endpoint with log probability support

  • Training dataset in JSON/JSONL format

  • Custom evaluator for computing rewards