With your environment set up and data prepared, you’re ready to run training. But before committing to a multi-hour, multi-node job, it’s important to verify everything works correctly on a single node first.
Goal: Run a single-node GRPO training session to validate your environment.
Time: ~45 minutes
In this section, you will:
Make sure you have:
Since we are running RL training, the following steps will all be run in the NeMo RL root directory, rather than NeMo Gym directory.
✅ Success Check: You should see a yaml file at examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml and a Python file at examples/nemo_gym/run_grpo_nemo_gym.py.
Estimated time: ~5-10 minutes
Download NVIDIA Nemotron Nano 9B v2:
✅ Success Check: Model files are downloaded to .cache/hub/models--nvidia--NVIDIA-Nemotron-Nano-9B-v2/.
Estimated time: ~1 minute
The Nemotron Nano 9B v2 model uses a custom chat template that must be modified for RL training. This step modifies the cached version of the chat template:
✅ Success Check: The sed commands complete without errors.
Estimated time: ~15-30 minutes
By default, this runs only 3 training steps (grpo.max_num_steps=3) as a small test run in preparation for multi-node training. If you are using a single node for the full training run, you can remove this value. The full training will take several hours.
Single GPU Training: If you only have 1 GPU available, use these modifications:
Key differences:
CUDA_VISIBLE_DEVICES=0 to use only GPU 0cluster.gpus_per_node=1policy.megatron_cfg.tensor_model_parallel_size=1The end of the command above does the following:
&> results/$EXP_NAME/output.log: Pipes the terminal outputs into a file at results/$EXP_NAME/output.log that you can view.&: This final ampersand runs the job in the background, which frees up your terminal to do other things. You can view all the background jobs using the jobs command. If you need to quit the training run, you can use the fg command to bring the job from the background into the foreground and then Ctrl+C like normal.✅ Success Check: Training completes 3 steps on single node without any issues. Check the logs for errors and verify that training steps are progressing.
Your single-node run validated the environment. Scale to multiple nodes for production training:
Continue to Multi-Node Training →