> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/gym/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/gym/llms-full.txt.

# Training with VeRL

> Run RL training on NeMo Gym environments using the nemo_gym recipe in verl.

# Training with VeRL

This guide covers how to set up and launch RL training on NeMo Gym environments using the [`nemo_gym` recipe](https://github.com/verl-project/verl-recipe/tree/main/nemo_gym) in verl, tested with vLLM 0.17 (`verlai/verl:vllm017.latest`).

## Prerequisites

* **Container**: `verlai/verl:vllm017.latest` (vLLM 0.17.0)
* **NeMo Gym**: 0.2.1+ — `pip install nemo-gym` or `pip install -e $NEMO_GYM_ROOT` at job start
* **Slurm cluster** with GPU nodes

Clone verl with its recipe submodule:

```bash
cd $WORKSPACE
git clone --recurse-submodules https://github.com/verl-project/verl.git
```

If you already cloned verl without submodules:

```bash
cd $WORKSPACE/verl
git submodule update --init --recursive
```

## 1. Prepare training data

Using NeMo Gym, prepare the training dataset for your environment. Each row needs an `agent_ref` field so NeMo Gym can route it to the right agent:

```bash
cd $NEMO_GYM_ROOT
source .venv/bin/activate

config_paths="resources_servers/workplace_assistant/configs/workplace_assistant.yaml,\
responses_api_models/vllm_model/configs/vllm_model_for_training.yaml"

ng_prepare_data \
    "+config_paths=[${config_paths}]" \
    +output_dirpath=data/workplace_assistant \
    +mode=train_preparation \
    +should_download=true \
    +data_source=huggingface
```

This produces `data/workplace_assistant/{train,validation}.jsonl` ready for training.

## 2. Set environment variables

In your verl clone, copy the recipe's `config.env.example` and fill in your paths:

```bash
cd $VERL_ROOT
cp recipe/nemo_gym/config.env.example config.env
```

```bash
# config.env
VERL_ROOT=/path/to/verl
NEMO_GYM_ROOT=/path/to/nemo-gym
HF_HOME=/path/to/hf_home
RESULTS_ROOT=/path/to/results
WANDB_USERNAME=your_username
WANDB_API_KEY=your_key
```

## 3. Point verl at NeMo Gym

Each training run needs a YAML listing the NeMo Gym servers to launch (see `recipe/nemo_gym/configs/` for examples):

```yaml
# recipe/nemo_gym/configs/workplace.yaml
nemo_gym:
  nemo_gym_root: $NEMO_GYM_ROOT
  uses_reasoning_parser: false         # set true for reasoning models
  config_paths:
    - $NEMO_GYM_ROOT/responses_api_models/vllm_model/configs/vllm_model_for_training.yaml
    - $NEMO_GYM_ROOT/resources_servers/workplace_assistant/configs/workplace_assistant.yaml
```

The first config launches the model server, which tracks token IDs and log probs to prevent retokenization mismatches. Each additional resources server entry adds an environment.

## 4. Use the recipe when launching verl training

In your verl training script, swap in the NeMo Gym dataset loader and agent-loop manager:

```bash
+data.custom_cls.path=recipe/nemo_gym/dataset.py
+data.custom_cls.name=NeMoGymJSONLDataset
+actor_rollout_ref.rollout.agent.agent_loop_manager_class=recipe.nemo_gym.agent_loop.NeMoGymAgentLoopManager
+actor_rollout_ref.rollout.agent.agent_loop_config_path=${VERL_ROOT}/recipe/nemo_gym/configs/workplace.yaml
```

## 5. Launch

The recipe includes example Slurm job submission scripts (`submit_math.sh`, `submit_workplace.sh`, `submit_multienv.sh`). Update these with your Slurm-specific variables such as account and partition, then submit:

```bash
cd $VERL_ROOT
sbatch recipe/nemo_gym/submit_workplace.sh
```

## Multi-environment training

To train on multiple environments simultaneously, create a mixed dataset where each row has an `agent_ref` pointing to its environment, and include all environment config paths in the YAML:

```yaml
# recipe/nemo_gym/configs/multienv.yaml
nemo_gym:
  nemo_gym_root: $NEMO_GYM_ROOT
  config_paths:
    - $NEMO_GYM_ROOT/responses_api_models/vllm_model/configs/vllm_model_for_training.yaml
    - $NEMO_GYM_ROOT/resources_servers/math_with_judge/configs/math_with_judge.yaml
    - $NEMO_GYM_ROOT/resources_servers/workplace_assistant/configs/workplace_assistant.yaml
```

NeMo Gym routes each row to its environment via the `agent_ref` field. The data blend determines the sampling ratio between environments — if precise blending or curriculum is desired, do not shuffle the dataset after creation.

<Note>
  Some NeMo Gym environments (e.g. SWE-RL) launch containers and may require additional setup such as Apptainer. See each environment's README in the NeMo Gym repo for details.
</Note>

***

For additional details, see [`recipe/nemo_gym/README.rst`](https://github.com/verl-project/verl-recipe/blob/main/nemo_gym/README.rst).