> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/gym/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/gym/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/gym/_mcp/server.

# NeMo RL

This tutorial trains NVIDIA [Nemotron Nano 9B v2](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2) to improve its **multi-step tool-calling** capability using the **GRPO (Group Relative Policy Optimization)** algorithm on the **Workplace Assistant** environment.

Workplace Assistant is a realistic office simulation (calendar, email, project management, etc.) with complex multi-step tasks, providing a strong data distribution for training enterprise-ready tool-using assistants.

**Goal**: Train a model for multi-step tool calling using GRPO on the Workplace Assistant environment.

**Time**: \~3-5 hours (full series)

**In this tutorial, you will**:

1. Set up NeMo RL and NeMo Gym for **reinforcement learning** training
2. Understand the Workplace Assistant environment and its multi-step tool calling tasks
3. Configure and run GRPO training on Nemotron Nano v2 9B
4. Monitor training progress via Weights & Biases (W\&B)

> **TL;DR:** Want to jump straight to running commands? Skip to [Setup](/training-tutorials/nemo-rl-grpo/setup).

***

## Prerequisites

Make sure you have these prerequisites ready:

* ✅ **Hardware**: 1+ nodes with 8× NVIDIA GPUs (80GB+ each, such as H100 or A100)
  * Single-node testing: 1 node with 8 GPUs
  * Multi-node production: 8+ nodes with 8 GPUs each recommended
  * RAM: 64 GB+ per node
* ✅ **Storage**: 100 GB+ free disk space on a shared filesystem
* ✅ **Software**: Linux, Python 3.12+, Git, Slurm for multi-node training
* ✅ **Familiarity**: Python, LLM fine-tuning, basic RL concepts (in-depth RLVR/GRPO knowledge not required)

NeMo Gym does not require GPUs. GPUs are only necessary for GRPO training with NeMo RL.

**Optional accounts**:

* **Weights & Biases (W\&B)**: For experiment tracking ([sign up](https://wandb.ai/signup), [get API key](https://wandb.ai/authorize)). Training proceeds without W\&B if not configured.
* **HuggingFace**: For downloading models ([create token](https://huggingface.co/settings/tokens)). Recommended to avoid rate limits.

**Total time estimate**: \~3-5 hours (including environment setup, data preparation, and training)

***

## Tutorial Steps

Follow these steps sequentially to complete the tutorial:

Understand the dataset you will train on and its multi-step tool calling tasks.

background

Understand the Gym configuration component in the NeMo RL training config file.

configuration

Understand the GRPO and NeMo RL configuration components in the training config file.

configuration

Clone repositories, install dependencies, and prepare the training data.

prerequisite

Perform a single node GRPO training run with success criteria.

training

Scale to multi-node GRPO training for production.

training

***

## Next Steps

After completing this tutorial, explore these options:

Explore other environments available for training and evaluation.

github

resources-servers

Create your own resources server with custom tools and verification logic.

tutorial

custom-tools