NVIDIA NeMo Agent Toolkit Finetuning Harness for Reinforcement Learning#

Warning

Experimental Feature: The Finetuning Harness is experimental and may change in future releases. Future versions may introduce breaking changes without notice.

The NeMo Agent Toolkit provides a powerful finetuning harness designed for in-situ reinforcement learning of agentic LLM workflows. This enables iterative improvement of agents through experience, allowing models to learn from their interactions with environments, tools, and users.

Overview#

The finetuning harness is built on four foundational principles:

Principle	Description
Decoupled Architecture	Training logic is separated from backends, allowing you to use any RL framework (OpenPipe ART, NeMo Aligner, custom implementations)
In-Situ Training	Train agents with the same workflow you run in production, without moving to a different development environment.
Flexible Targeting	Finetune specific functions or entire workflows, enabling targeted improvements in complex agentic systems.
Composable Components	Three pluggable components (TrajectoryBuilder, TrainerAdapter, Trainer) can be mixed, matched, and customized

Architecture#

┌────────────────────────────────────────────────────────────────────────┐
│                              Trainer                                   │
│  (Orchestrates the finetuning loop across epochs)                      │
│                                                                        │
│  ┌───────────────────────┐         ┌───────────────────────────┐       │
│  │  TrajectoryBuilder    │         │    TrainerAdapter         │       │
│  │                       │         │                           │       │
│  │  - Runs evaluations   │ ──────► │  - Validates trajectories │       │
│  │  - Collects episodes  │         │  - Submits to backend     │       │
│  │  - Computes rewards   │         │  - Monitors training      │       │
│  │  - Groups trajectories│         │  - Reports status         │       │
│  └───────────────────────┘         └───────────────────────────┘       │
└────────────────────────────────────────────────────────────────────────┘
                                         │
                                         ▼
                            ┌─────────────────────────┐
                            │   Remote Training       │
                            │      Backend            │
                            └─────────────────────────┘

Documentation#

Guide	Description
Concepts	Core concepts, RL fundamentals, curriculum learning, and architecture details
Extending	How to implement custom TrajectoryBuilders, TrainerAdapters, and Trainers
OpenPipe ART	Using the OpenPipe ART backend for GRPO training

Supported Backends#

Backend	Plugin Package	Description
OpenPipe ART	`nvidia-nat-openpipe-art`	GRPO-based training with vLLM and TorchTune

Key Features#

Curriculum Learning: Progressively introduce harder examples during training
Multi-Generation Trajectories: Collect multiple responses per example for GRPO optimization
Validation Monitoring: Periodic evaluation on held-out data to track generalization
Progress Visualization: Automatic reward plots and metrics logging
Flexible Targeting: Train specific functions or models in complex workflows

Requirements#

Training backend (e.g., OpenPipe ART server with GPU)
LLM inference endpoint with log probability support
Training dataset in JSON/JSONL format
Custom evaluator for computing rewards