NeMo RL

View as Markdown

This tutorial trains NVIDIA Nemotron Nano 9B v2 to improve its multi-step tool-calling capability using the GRPO (Group Relative Policy Optimization) algorithm on the Workplace Assistant environment.

Workplace Assistant is a realistic office simulation (calendar, email, project management, etc.) with complex multi-step tasks, providing a strong data distribution for training enterprise-ready tool-using assistants.

Goal: Train a model for multi-step tool calling using GRPO on the Workplace Assistant environment.

Time: ~3-5 hours (full series)

In this tutorial, you will:

  1. Set up NeMo RL and NeMo Gym for reinforcement learning training
  2. Understand the Workplace Assistant environment and its multi-step tool calling tasks
  3. Configure and run GRPO training on Nemotron Nano v2 9B
  4. Monitor training progress via Weights & Biases (W&B)

TL;DR: Want to jump straight to running commands? Skip to Setup.


Prerequisites

Make sure you have these prerequisites ready:

  • Hardware: 1+ nodes with 8× NVIDIA GPUs (80GB+ each, such as H100 or A100)
    • Single-node testing: 1 node with 8 GPUs
    • Multi-node production: 8+ nodes with 8 GPUs each recommended
    • RAM: 64 GB+ per node
  • Storage: 100 GB+ free disk space on a shared filesystem
  • Software: Linux, Python 3.12+, Git, Slurm for multi-node training
  • Familiarity: Python, LLM fine-tuning, basic RL concepts (in-depth RLVR/GRPO knowledge not required)

NeMo Gym does not require GPUs. GPUs are only necessary for GRPO training with NeMo RL.

Optional accounts:

  • Weights & Biases (W&B): For experiment tracking (sign up, get API key). Training proceeds without W&B if not configured.
  • HuggingFace: For downloading models (create token). Recommended to avoid rate limits.

Total time estimate: ~3-5 hours (including environment setup, data preparation, and training)


Tutorial Steps

Follow these steps sequentially to complete the tutorial:


Next Steps

After completing this tutorial, explore these options: