Single Node Training
With your environment set up and data prepared, you’re ready to run training. But before committing to a multi-hour, multi-node job, it’s important to verify everything works correctly on a single node first.
Goal: Run a single-node GRPO training session to validate your environment.
Time: ~45 minutes
In this section, you will:
- Download the Nemotron Nano 9B v2 model
- Configure the model’s chat template
- Clean up existing processes
- Run a test training session with 3 steps
Prerequisites
Make sure you have:
- ✅ Completed the Setup instructions
- ✅ Access to a running container session with GPUs
- ✅ (Optional) Weights & Biases API key for experiment tracking
0. Return to NeMo RL directory
Since we are running RL training, the following steps will all be run in the NeMo RL root directory, rather than NeMo Gym directory.
✅ Success Check: You should see a yaml file at examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml and a Python file at examples/nemo_gym/run_grpo_nemo_gym.py.
1. Download the Model
Estimated time: ~5-10 minutes
Download NVIDIA Nemotron Nano 9B v2:
✅ Success Check: Model files are downloaded to .cache/hub/models--nvidia--NVIDIA-Nemotron-Nano-9B-v2/.
2. Configure the Chat Template
Estimated time: ~1 minute
The Nemotron Nano 9B v2 model uses a custom chat template that must be modified for RL training. This step modifies the cached version of the chat template:
✅ Success Check: The sed commands complete without errors.
3. Run Training
Estimated time: ~15-30 minutes
By default, this runs only 3 training steps (grpo.max_num_steps=3) as a small test run in preparation for multi-node training. If you are using a single node for the full training run, you can remove this value. The full training will take several hours.
Single GPU Training: If you only have 1 GPU available, use these modifications:
Key differences:
- Added
CUDA_VISIBLE_DEVICES=0to use only GPU 0 - Set
cluster.gpus_per_node=1 - Set
policy.megatron_cfg.tensor_model_parallel_size=1
The end of the command above does the following:
&> results/$EXP_NAME/output.log: Pipes the terminal outputs into a file atresults/$EXP_NAME/output.logthat you can view.&: This final ampersand runs the job in the background, which frees up your terminal to do other things. You can view all the background jobs using thejobscommand. If you need to quit the training run, you can use thefgcommand to bring the job from the background into the foreground and then Ctrl+C like normal.
✅ Success Check: Training completes 3 steps on single node without any issues. Check the logs for errors and verify that training steps are progressing.
Next Steps
Your single-node run validated the environment. Scale to multiple nodes for production training:
Continue to Multi-Node Training →