For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • About
    • Concepts
    • Architecture
    • Ecosystem
    • Release Notes
  • Get Started
    • Prerequisites
    • Installation
    • Quickstart
  • Agent Server
  • Model Server
    • vLLM
  • Resources Server
  • Data
    • Prepare and Validate
    • Download from Hugging Face
    • Prompt Config
  • Environment Tutorials
    • Single-Step Environment
    • Multi-Step Environment
    • Stateful Environment
    • Real-World Environment
    • Integrate external libraries
    • Add a benchmark
    • Verification Patterns
    • Aggregate Metrics
  • Training Tutorials
    • NeMo RL
    • Unsloth
    • Multi-Environment Training
    • Training with VeRL
    • Offline Training (SFT/DPO)
  • Model Recipes
    • Nemotron 3 Nano
    • Nemotron 3 Super
  • Infrastructure
    • Deployment Topology
    • Engineering Notes
  • Reference
    • Configuration
    • RL Framework Compatibility
    • CLI Commands
    • FAQ
  • Troubleshooting
    • Configuration Errors
  • Contribute
    • Development Setup
    • Environments
    • Integrate RL Frameworks
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Gym
On this page
  • Prerequisites
  • 1. Prepare training data
  • 2. Set environment variables
  • 3. Point verl at NeMo Gym
  • 4. Use the recipe when launching verl training
  • 5. Launch
  • Multi-environment training
Training Tutorials

Training with VeRL

||View as Markdown|
Previous

Multi-Environment Training

Next

Offline Training (SFT/DPO)

This guide covers how to set up and launch RL training on NeMo Gym environments using the nemo_gym recipe in verl, tested with vLLM 0.17 (verlai/verl:vllm017.latest).

Prerequisites

  • Container: verlai/verl:vllm017.latest (vLLM 0.17.0)
  • NeMo Gym: 0.2.1+ — pip install nemo-gym or pip install -e $NEMO_GYM_ROOT at job start
  • Slurm cluster with GPU nodes

Clone verl with its recipe submodule at the commit pinned in REQUIRED_VERL.txt:

$git clone --recurse-submodules https://github.com/verl-project/verl.git
$cd verl
$git checkout 695ac0ebcb5d4e1ca7bcb88fd952b0214daf199f
$git submodule update --init --recursive recipe
$cd recipe && git checkout main && cd ..

If you already cloned verl without submodules:

$git submodule update --init --recursive

1. Prepare training data

Using NeMo Gym, prepare the training dataset for your environment. Each row needs an agent_ref field so NeMo Gym can route it to the right agent:

$cd $NEMO_GYM_ROOT
$source .venv/bin/activate
$
$config_paths="resources_servers/workplace_assistant/configs/workplace_assistant.yaml,\
>responses_api_models/vllm_model/configs/vllm_model_for_training.yaml"
$
$ng_prepare_data \
> "+config_paths=[${config_paths}]" \
> +output_dirpath=data/workplace_assistant \
> +mode=train_preparation \
> +should_download=true \
> +data_source=huggingface

This produces data/workplace_assistant/{train,validation}.jsonl ready for training.

2. Set environment variables

In your verl clone, copy the recipe’s config.env.example and fill in your paths:

$cd $VERL_ROOT
$cp recipe/nemo_gym/config.env.example config.env
$# config.env
$VERL_ROOT=/path/to/verl
$NEMO_GYM_ROOT=/path/to/nemo-gym
$HF_HOME=/path/to/hf_home
$RESULTS_ROOT=/path/to/results
$WANDB_USERNAME=your_username
$WANDB_API_KEY=your_key

3. Point verl at NeMo Gym

Each training run needs a YAML listing the NeMo Gym servers to launch (see recipe/nemo_gym/configs/ for examples):

1# recipe/nemo_gym/configs/workplace.yaml
2nemo_gym:
3 nemo_gym_root: $NEMO_GYM_ROOT
4 uses_reasoning_parser: false # set true for reasoning models
5 config_paths:
6 - $NEMO_GYM_ROOT/responses_api_models/vllm_model/configs/vllm_model_for_training.yaml
7 - $NEMO_GYM_ROOT/resources_servers/workplace_assistant/configs/workplace_assistant.yaml

The first config launches the model server, which tracks token IDs and log probs to prevent retokenization mismatches. Each additional resources server entry adds an environment.

4. Use the recipe when launching verl training

In your verl training script, swap in the NeMo Gym dataset loader and agent-loop manager:

$+data.custom_cls.path=recipe/nemo_gym/dataset.py
$+data.custom_cls.name=NeMoGymJSONLDataset
$+actor_rollout_ref.rollout.agent.agent_loop_manager_class=recipe.nemo_gym.agent_loop.NeMoGymAgentLoopManager
$+actor_rollout_ref.rollout.agent.agent_loop_config_path=${VERL_ROOT}/recipe/nemo_gym/configs/workplace.yaml

5. Launch

The recipe includes example Slurm job submission scripts (submit_math.sh, submit_workplace.sh, submit_multienv.sh). Update these with your Slurm-specific variables such as account and partition, then submit:

$sbatch recipe/nemo_gym/submit_workplace.sh

Multi-environment training

To train on multiple environments simultaneously, create a mixed dataset where each row has an agent_ref pointing to its environment, and include all environment config paths in the YAML:

1# recipe/nemo_gym/configs/multienv.yaml
2nemo_gym:
3 nemo_gym_root: $NEMO_GYM_ROOT
4 config_paths:
5 - $NEMO_GYM_ROOT/responses_api_models/vllm_model/configs/vllm_model_for_training.yaml
6 - $NEMO_GYM_ROOT/resources_servers/math_with_judge/configs/math_with_judge.yaml
7 - $NEMO_GYM_ROOT/resources_servers/workplace_assistant/configs/workplace_assistant.yaml

NeMo Gym routes each row to its environment via the agent_ref field. The data blend determines the sampling ratio between environments — if precise blending or curriculum is desired, do not shuffle the dataset after creation.

Some NeMo Gym environments (e.g. SWE-RL) launch containers and may require additional setup such as Apptainer. See each environment’s README in the NeMo Gym repo for details.


For additional details, see recipe/nemo_gym/README.rst.