For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • About
    • Concepts
    • Environment Components
    • Ecosystem
    • Release Notes
  • Get Started
    • Prerequisites
    • Installation
    • Quickstart
  • Agent Server
  • Model Server
    • vLLM
  • Resources Server
  • Data
    • Prepare and Validate
    • Download from Hugging Face
    • Prompt Config
  • Environment Tutorials
    • Single-Step Environment
    • Multi-Step Environment
    • Stateful Environment
    • Real-World Environment
    • Integrate external libraries
    • Add a benchmark
    • Aggregate Metrics
    • LLM-as-a-judge in verification
  • Training Tutorials
    • NeMo RL
    • Unsloth
    • Multi-Environment Training
    • Training with VeRL
    • Offline Training (SFT/DPO)
  • Model Recipes
    • Nemotron 3 Nano
    • Nemotron 3 Super
  • Infrastructure
    • Deployment Topology
    • Engineering Notes
  • Reference
    • Configuration
    • RL Framework Compatibility
    • CLI Commands
    • FAQ
  • Troubleshooting
    • Configuration Errors
  • Contribute
    • Development Setup
    • Environments
    • Integrate RL Frameworks
On this page
  • RL (GRPO)
  • Multi-Environment Training
  • SFT & DPO

Training Tutorials

||View as Markdown|

We have hands-on tutorials with supported training frameworks to help you train with NeMo Gym environments. If you’re interested in integrating another training framework, see the Training Framework Integration Guide.

See Training for a refresher on when to use GRPO, SFT, or DPO.

RL (GRPO)

NeMo RL

Tutorial-series: GRPO training to improve multi-step tool calling on the Workplace Assistant environment, scaling from single-node to multi-node training.

nemo rlgrpo3-5 hours
OpenRLHF

Review the agent executor for using NeMo Gym environments with OpenRLHF.

openrlhf
Unsloth

Example GRPO training on instruction following and reasoning environments.

unslothsingle-gpu30 min
VeRL

Example DAPO training on math and agentic environments using VeRL, with single and multi-environment support.

verldapomulti-node1 hour

Multi-Environment Training

Multi-Environment Training

Run multiple training environments simultaneously for rollout collection.

multi-environmentmulti-verifier

SFT & DPO

Offline Training with Rollouts

Transform rollouts into training data for supervised fine-tuning (SFT) and direct preference optimization (DPO).

sftdpo
Previous

LLM-as-a-judge in verification

Next

NeMo RL

NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Gym