This guide covers how to set up and launch RL training on NeMo Gym environments using the nemo_gym recipe in verl, tested with vLLM 0.17 (verlai/verl:vllm017.latest).
verlai/verl:vllm017.latest (vLLM 0.17.0)pip install nemo-gym or pip install -e $NEMO_GYM_ROOT at job startClone verl with its recipe submodule at the commit pinned in REQUIRED_VERL.txt:
If you already cloned verl without submodules:
Using NeMo Gym, prepare the training dataset for your environment. Each row needs an agent_ref field so NeMo Gym can route it to the right agent:
This produces data/workplace_assistant/{train,validation}.jsonl ready for training.
In your verl clone, copy the recipe’s config.env.example and fill in your paths:
Each training run needs a YAML listing the NeMo Gym servers to launch (see recipe/nemo_gym/configs/ for examples):
The first config launches the model server, which tracks token IDs and log probs to prevent retokenization mismatches. Each additional resources server entry adds an environment.
In your verl training script, swap in the NeMo Gym dataset loader and agent-loop manager:
The recipe includes example Slurm job submission scripts (submit_math.sh, submit_workplace.sh, submit_multienv.sh). Update these with your Slurm-specific variables such as account and partition, then submit:
To train on multiple environments simultaneously, create a mixed dataset where each row has an agent_ref pointing to its environment, and include all environment config paths in the YAML:
NeMo Gym routes each row to its environment via the agent_ref field. The data blend determines the sampling ratio between environments — if precise blending or curriculum is desired, do not shuffle the dataset after creation.
Some NeMo Gym environments (e.g. SWE-RL) launch containers and may require additional setup such as Apptainer. See each environment’s README in the NeMo Gym repo for details.
For additional details, see recipe/nemo_gym/README.rst.