Training with VeRL

View as Markdown

Training with VeRL

This guide covers how to set up and launch RL training on NeMo Gym environments using the nemo_gym recipe in verl, tested with vLLM 0.17 (verlai/verl:vllm017.latest).

Prerequisites

  • Container: verlai/verl:vllm017.latest (vLLM 0.17.0)
  • NeMo Gym: 0.2.1+ — pip install nemo-gym or pip install -e $NEMO_GYM_ROOT at job start
  • Slurm cluster with GPU nodes

Clone verl with its recipe submodule:

$cd $WORKSPACE
$git clone --recurse-submodules https://github.com/verl-project/verl.git

If you already cloned verl without submodules:

$cd $WORKSPACE/verl
$git submodule update --init --recursive

1. Prepare training data

Using NeMo Gym, prepare the training dataset for your environment. Each row needs an agent_ref field so NeMo Gym can route it to the right agent:

$cd $NEMO_GYM_ROOT
$source .venv/bin/activate
$
$config_paths="resources_servers/workplace_assistant/configs/workplace_assistant.yaml,\
>responses_api_models/vllm_model/configs/vllm_model_for_training.yaml"
$
$ng_prepare_data \
> "+config_paths=[${config_paths}]" \
> +output_dirpath=data/workplace_assistant \
> +mode=train_preparation \
> +should_download=true \
> +data_source=huggingface

This produces data/workplace_assistant/{train,validation}.jsonl ready for training.

2. Set environment variables

In your verl clone, copy the recipe’s config.env.example and fill in your paths:

$cd $VERL_ROOT
$cp recipe/nemo_gym/config.env.example config.env
$# config.env
$VERL_ROOT=/path/to/verl
$NEMO_GYM_ROOT=/path/to/nemo-gym
$HF_HOME=/path/to/hf_home
$RESULTS_ROOT=/path/to/results
$WANDB_USERNAME=your_username
$WANDB_API_KEY=your_key

3. Point verl at NeMo Gym

Each training run needs a YAML listing the NeMo Gym servers to launch (see recipe/nemo_gym/configs/ for examples):

1# recipe/nemo_gym/configs/workplace.yaml
2nemo_gym:
3 nemo_gym_root: $NEMO_GYM_ROOT
4 uses_reasoning_parser: false # set true for reasoning models
5 config_paths:
6 - $NEMO_GYM_ROOT/responses_api_models/vllm_model/configs/vllm_model_for_training.yaml
7 - $NEMO_GYM_ROOT/resources_servers/workplace_assistant/configs/workplace_assistant.yaml

The first config launches the model server, which tracks token IDs and log probs to prevent retokenization mismatches. Each additional resources server entry adds an environment.

4. Use the recipe when launching verl training

In your verl training script, swap in the NeMo Gym dataset loader and agent-loop manager:

$+data.custom_cls.path=recipe/nemo_gym/dataset.py
$+data.custom_cls.name=NeMoGymJSONLDataset
$+actor_rollout_ref.rollout.agent.agent_loop_manager_class=recipe.nemo_gym.agent_loop.NeMoGymAgentLoopManager
$+actor_rollout_ref.rollout.agent.agent_loop_config_path=${VERL_ROOT}/recipe/nemo_gym/configs/workplace.yaml

5. Launch

The recipe includes example Slurm job submission scripts (submit_math.sh, submit_workplace.sh, submit_multienv.sh). Update these with your Slurm-specific variables such as account and partition, then submit:

$cd $VERL_ROOT
$sbatch recipe/nemo_gym/submit_workplace.sh

Multi-environment training

To train on multiple environments simultaneously, create a mixed dataset where each row has an agent_ref pointing to its environment, and include all environment config paths in the YAML:

1# recipe/nemo_gym/configs/multienv.yaml
2nemo_gym:
3 nemo_gym_root: $NEMO_GYM_ROOT
4 config_paths:
5 - $NEMO_GYM_ROOT/responses_api_models/vllm_model/configs/vllm_model_for_training.yaml
6 - $NEMO_GYM_ROOT/resources_servers/math_with_judge/configs/math_with_judge.yaml
7 - $NEMO_GYM_ROOT/resources_servers/workplace_assistant/configs/workplace_assistant.yaml

NeMo Gym routes each row to its environment via the agent_ref field. The data blend determines the sampling ratio between environments — if precise blending or curriculum is desired, do not shuffle the dataset after creation.

Some NeMo Gym environments (e.g. SWE-RL) launch containers and may require additional setup such as Apptainer. See each environment’s README in the NeMo Gym repo for details.


For additional details, see recipe/nemo_gym/README.rst.