For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Documentation
    • Home
  • About
    • Concepts
    • Ecosystem
  • Get Started
    • Quickstart
    • Detailed Setup Guide
    • Install from PyPI
    • Rollout Collection
  • Agent Server
  • Model Server
    • vLLM
  • Resources Server
  • Data
    • Prepare and Validate
    • Download from Hugging Face
    • Prompt Config
  • Environment Tutorials
    • Single-Step Environment
    • Multi-Step Environment
    • Stateful Environment
    • Real-World Environment
    • Integrate external libraries
    • Aggregate Metrics
    • LLM-as-Judge Verification
  • Benchmarks
    • Run benchmarks
    • Add a benchmark
    • Design a customer evaluation
  • Training Tutorials
    • NeMo RL
      • About Workplace Assistant
      • Gym Configuration
      • NeMo RL Configuration
      • Setup
      • Single Node Training
      • Multi-Node Training
    • Unsloth
    • Multi-Environment Training
    • Offline Training (SFT/DPO)
  • Model Recipes
    • Nemotron 3 Nano
    • Nemotron 3 Super
  • Infrastructure
    • Deployment Topology
    • Engineering Notes
  • Reference
    • Configuration
    • RL Framework Compatibility
    • CLI Commands
    • FAQ
  • Troubleshooting
    • Configuration Errors
  • Contribute
    • Development Setup
    • Environments
    • Integrate RL Frameworks
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Gym
On this page
  • Prerequisites
  • 0. Return to NeMo RL directory
  • 1. Download the Model
  • 2. Configure the Chat Template
  • 3. Run Training
  • Next Steps
Training TutorialsNeMo RL

Single Node Training

||View as Markdown|
Previous

Setup

Next

Multi-Node Training

With your environment set up and data prepared, you’re ready to run training. But before committing to a multi-hour, multi-node job, it’s important to verify everything works correctly on a single node first.

Goal: Run a single-node GRPO training session to validate your environment.

Time: ~45 minutes

In this section, you will:

  1. Download the Nemotron Nano 9B v2 model
  2. Configure the model’s chat template
  3. Clean up existing processes
  4. Run a test training session with 3 steps
← Previous: Setup

Prerequisites

Make sure you have:

  • ✅ Completed the Setup instructions
  • ✅ Access to a running container session with GPUs
  • ✅ (Optional) Weights & Biases API key for experiment tracking

0. Return to NeMo RL directory

Since we are running RL training, the following steps will all be run in the NeMo RL root directory, rather than NeMo Gym directory.

✅ Success Check: You should see a yaml file at examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml and a Python file at examples/nemo_gym/run_grpo_nemo_gym.py.

1. Download the Model

Estimated time: ~5-10 minutes

Download NVIDIA Nemotron Nano 9B v2:

$HF_HOME=$PWD/.cache/ \
>HF_TOKEN={your HF token} \
> hf download nvidia/NVIDIA-Nemotron-Nano-9B-v2

✅ Success Check: Model files are downloaded to .cache/hub/models--nvidia--NVIDIA-Nemotron-Nano-9B-v2/.


2. Configure the Chat Template

Estimated time: ~1 minute

The Nemotron Nano 9B v2 model uses a custom chat template that must be modified for RL training. This step modifies the cached version of the chat template:

$tokenizer_config_path=$(find $PWD/.cache/hub/models--nvidia--NVIDIA-Nemotron-Nano-9B-v2 -name tokenizer_config.json)
$sed -i 's/enable_thinking=true/enable_thinking=false/g' $tokenizer_config_path
$sed -i 's/{%- if messages\[-1\]\['\''role'\''\] == '\''assistant'\'' -%}{%- set ns.last_turn_assistant_content = messages\[-1\]\['\''content'\''\].strip() -%}{%- set messages = messages\[:-1\] -%}{%- endif -%}//g' $tokenizer_config_path

✅ Success Check: The sed commands complete without errors.


3. Run Training

Estimated time: ~15-30 minutes

By default, this runs only 3 training steps (grpo.max_num_steps=3) as a small test run in preparation for multi-node training. If you are using a single node for the full training run, you can remove this value. The full training will take several hours.

$# Set experiment name with timestamp
$EXP_NAME="$(date +%Y%m%d)/nemo_gym_grpo/nemotron_nano_v2_9b/workplace_assistant_001"
$mkdir -p results/$EXP_NAME
$
$# Configuration file path
$CONFIG_PATH=examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml
$
$# Launch training
$# Set these environment variables before running:
$# WANDB_API_KEY: Your Weights & Biases API key for logging
$# logger.wandb.project: Fill in your username
$TORCH_CUDA_ARCH_LIST="9.0 10.0" \
>HF_HOME=$PWD/.cache/ \
>WANDB_API_KEY={your W&B API key} \
>uv run python examples/nemo_gym/run_grpo_nemo_gym.py \
> --config=$CONFIG_PATH \
> ++logger.wandb.project="${Your Username}-nemo-gym-rl-integration" \
> ++logger.wandb.name=$EXP_NAME \
> ++logger.log_dir=results/$EXP_NAME \
> ++policy.generation.vllm_cfg.tool_parser_plugin=$(find $PWD/.cache -name nemotron_toolcall_parser_no_streaming.py) \
> ++grpo.max_num_steps=3 \
> ++checkpointing.checkpoint_dir=results/$EXP_NAME &> results/$EXP_NAME/output.log &
$
$# Watch the logs
$tail -f results/$EXP_NAME/output.log

Single GPU Training: If you only have 1 GPU available, use these modifications:

$CUDA_VISIBLE_DEVICES=0 \
>TORCH_CUDA_ARCH_LIST="9.0 10.0" \
>HF_HOME=$PWD/.cache/ \
>WANDB_API_KEY={your W&B API key} \
>uv run python examples/nemo_gym/run_grpo_nemo_gym.py \
> --config=$CONFIG_PATH \
> ++logger.wandb.project="${Your Username}-nemo-gym-rl-integration" \
> ++logger.wandb.name=$EXP_NAME \
> ++logger.log_dir=results/$EXP_NAME \
> ++policy.generation.vllm_cfg.tool_parser_plugin=$(find $PWD/.cache -name nemotron_toolcall_parser_no_streaming.py) \
> ++grpo.max_num_steps=3 \
> ++checkpointing.checkpoint_dir=results/$EXP_NAME \
> cluster.gpus_per_node=1 \
> policy.megatron_cfg.tensor_model_parallel_size=1 \
> &> results/$EXP_NAME/output.log &

Key differences:

  • Added CUDA_VISIBLE_DEVICES=0 to use only GPU 0
  • Set cluster.gpus_per_node=1
  • Set policy.megatron_cfg.tensor_model_parallel_size=1

The end of the command above does the following:

$&> results/$EXP_NAME/output.log &
  1. &> results/$EXP_NAME/output.log: Pipes the terminal outputs into a file at results/$EXP_NAME/output.log that you can view.
  2. &: This final ampersand runs the job in the background, which frees up your terminal to do other things. You can view all the background jobs using the jobs command. If you need to quit the training run, you can use the fg command to bring the job from the background into the foreground and then Ctrl+C like normal.

✅ Success Check: Training completes 3 steps on single node without any issues. Check the logs for errors and verify that training steps are progressing.


Next Steps

Your single-node run validated the environment. Scale to multiple nodes for production training:

Continue to Multi-Node Training →