> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/gym/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/gym/llms-full.txt.

# NeMo RL

This tutorial trains NVIDIA [Nemotron Nano 9B v2](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2) to improve its ****multi-step** **tool-calling**** capability using the ****GRPO (Group Relative Policy Optimization)**** algorithm on the **Workplace Assistant** environment.

Workplace Assistant is a realistic office simulation (calendar, email, project management, etc.) with complex multi-step tasks, providing a strong data distribution for training enterprise-ready tool-using assistants.

<Info>
  **Goal**: Train a model for multi-step tool calling using GRPO on the Workplace Assistant environment.

  **Time**: \~3-5 hours (full series)

  **In this tutorial, you will**:

  1. Set up NeMo RL and NeMo Gym for **reinforcement learning** training
  2. Understand the Workplace Assistant environment and its multi-step tool calling tasks
  3. Configure and run GRPO training on Nemotron Nano v2 9B
  4. Monitor training progress via Weights & Biases (W\&B)
</Info>

> **TL;DR:** Want to jump straight to running commands? Skip to [Setup](/latest/training-tutorials/nemo-rl-grpo/setup).

***

## Prerequisites

Make sure you have these prerequisites ready:

* ✅ **Hardware**: 1+ nodes with 8× NVIDIA GPUs (80GB+ each, such as H100 or A100)
  * Single-node testing: 1 node with 8 GPUs
  * Multi-node production: 8+ nodes with 8 GPUs each recommended
  * RAM: 64 GB+ per node
* ✅ **Storage**: 100 GB+ free disk space on a shared filesystem
* ✅ **Software**: Linux, Python 3.12+, Git, Slurm for multi-node training
* ✅ **Familiarity**: Python, LLM fine-tuning, basic RL concepts (in-depth RLVR/GRPO knowledge not required)

<Note>
  NeMo Gym does not require GPUs. GPUs are only necessary for GRPO training with NeMo RL.
</Note>

**Optional accounts**:

* **Weights & Biases (W\&B)**: For experiment tracking ([sign up](https://wandb.ai/signup), [get API key](https://wandb.ai/authorize)). Training proceeds without W\&B if not configured.
* **HuggingFace**: For downloading models ([create token](https://huggingface.co/settings/tokens)). Recommended to avoid rate limits.

**Total time estimate**: \~3-5 hours (including environment setup, data preparation, and training)

***

## Tutorial Steps

Follow these steps sequentially to complete the tutorial:

<Cards>
  <Card title="1. About the Workplace Assistant Training Environment" href="/latest/training-tutorials/nemo-rl-grpo/about-workplace-assistant">
    Understand the dataset you will train on and its multi-step tool calling tasks.

    <Badge minimal outlined>
      background
    </Badge>
  </Card>

  <Card title="2. Gym Configuration" href="/latest/training-tutorials/nemo-rl-grpo/gym-configuration">
    Understand the Gym configuration component in the NeMo RL training config file.

    <Badge minimal outlined>
      configuration
    </Badge>
  </Card>

  <Card title="3. NeMo RL Configuration" href="/latest/training-tutorials/nemo-rl-grpo/nemo-rl-configuration">
    Understand the GRPO and NeMo RL configuration components in the training config file.

    <Badge minimal outlined>
      configuration
    </Badge>
  </Card>

  <Card title="4. Setup" href="/latest/training-tutorials/nemo-rl-grpo/setup">
    Clone repositories, install dependencies, and prepare the training data.

    <Badge intent="success" minimal outlined>
      prerequisite
    </Badge>
  </Card>

  <Card title="5. Single Node Training" href="/latest/training-tutorials/nemo-rl-grpo/single-node-training">
    Perform a single node GRPO training run with success criteria.

    <Badge intent="success" minimal outlined>
      training
    </Badge>
  </Card>

  <Card title="6. Multi-Node Training" href="/latest/training-tutorials/nemo-rl-grpo/multi-node-training">
    Scale to multi-node GRPO training for production.

    <Badge intent="success" minimal outlined>
      training
    </Badge>
  </Card>
</Cards>

***

## Next Steps

After completing this tutorial, explore these options:

<Cards>
  <Card title="Use Other Training Environments" href="https://github.com/NVIDIA-NeMo/Gym#-available-environments">
    Explore other environments available for training and evaluation.

    <Badge minimal outlined>
      github
    </Badge>

    <Badge minimal outlined>
      resources-servers
    </Badge>
  </Card>

  <Card title="Build a Custom Training Environment" href="/latest/environment-tutorials">
    Create your own resources server with custom tools and verification logic.

    <Badge minimal outlined>
      tutorial
    </Badge>

    <Badge minimal outlined>
      custom-tools
    </Badge>
  </Card>
</Cards>