Training with VeRL
Training with VeRL
This guide covers how to set up and launch RL training on NeMo Gym environments using the nemo_gym recipe in verl, tested with vLLM 0.17 (verlai/verl:vllm017.latest).
Prerequisites
- Container:
verlai/verl:vllm017.latest(vLLM 0.17.0) - NeMo Gym: 0.2.1+ —
pip install nemo-gymorpip install -e $NEMO_GYM_ROOTat job start - Slurm cluster with GPU nodes
Clone verl with its recipe submodule:
If you already cloned verl without submodules:
1. Prepare training data
Using NeMo Gym, prepare the training dataset for your environment. Each row needs an agent_ref field so NeMo Gym can route it to the right agent:
This produces data/workplace_assistant/{train,validation}.jsonl ready for training.
2. Set environment variables
In your verl clone, copy the recipe’s config.env.example and fill in your paths:
3. Point verl at NeMo Gym
Each training run needs a YAML listing the NeMo Gym servers to launch (see recipe/nemo_gym/configs/ for examples):
The first config launches the model server, which tracks token IDs and log probs to prevent retokenization mismatches. Each additional resources server entry adds an environment.
4. Use the recipe when launching verl training
In your verl training script, swap in the NeMo Gym dataset loader and agent-loop manager:
5. Launch
The recipe includes example Slurm job submission scripts (submit_math.sh, submit_workplace.sh, submit_multienv.sh). Update these with your Slurm-specific variables such as account and partition, then submit:
Multi-environment training
To train on multiple environments simultaneously, create a mixed dataset where each row has an agent_ref pointing to its environment, and include all environment config paths in the YAML:
NeMo Gym routes each row to its environment via the agent_ref field. The data blend determines the sampling ratio between environments — if precise blending or curriculum is desired, do not shuffle the dataset after creation.
Some NeMo Gym environments (e.g. SWE-RL) launch containers and may require additional setup such as Apptainer. See each environment’s README in the NeMo Gym repo for details.
For additional details, see recipe/nemo_gym/README.rst.