RL Training with NeMo RL using GRPO#

This tutorial trains NVIDIA Nemotron Nano 9B v2 to improve its multi-step tool-calling capability using the GRPO (Group Relative Policy Optimization) algorithm on the Workplace Assistant environment.

Workplace Assistant is a realistic office simulation (calendar, email, project management, etc.) with complex multi-step tasks, providing a strong data distribution for training enterprise-ready tool-using assistants.

Goal: Train a model for multi-step tool calling using GRPO on the Workplace Assistant environment.

In this tutorial, you will:

Set up NeMo RL and NeMo Gym for reinforcement learning training
Understand the Workplace Assistant environment and its multi-step tool calling tasks
Configure and run GRPO training on Nemotron Nano v2 9B
Monitor training progress via Weights & Biases (W&B)

TL;DR: Want to jump straight to running commands? Skip to Setup.

Before You Begin#

Make sure you have these prerequisites ready:

✅ Hardware: 1+ nodes with 8× NVIDIA GPUs (80GB+ each, such as H100 or A100)
- Single-node testing: 1 node with 8 GPUs
- Multi-node production: 8+ nodes with 8 GPUs each recommended
- RAM: 64 GB+ per node
✅ Storage: 100 GB+ free disk space on a shared filesystem
✅ Software: Linux, Python 3.12+, Git, Slurm for multi-node training
✅ Familiarity: Python, LLM fine-tuning, basic RL concepts (in-depth RLVR/GRPO knowledge not required)

Note

NeMo Gym does not require GPUs. GPUs are only necessary for GRPO training with NeMo RL.

Optional accounts:

Weights & Biases (W&B): For experiment tracking (sign up, get API key). Training proceeds without W&B if not configured.
HuggingFace: For downloading models (create token). Recommended to avoid rate limits.

Total time estimate: ~3-5 hours (including environment setup, data preparation, and training)