Multi-Node Training
Multi-Node Training
Your single-node test run confirmed that the environment, model, and training loop all work correctly. Now you can scale to multiple nodes for production training, where the full power of distributed computing accelerates your GRPO optimization.
Goal: Scale GRPO training to multiple nodes for production training.
Time: ~2-4 hours
In this section, you will:
- Launch a multi-node training job using Slurm batch mode
- Monitor training metrics in Weights & Biases
Prerequisites
Complete the Single Node Training first. Do not skip it. The single-node setup validates that your environment is configured correctly before attempting multi-node training.
Make sure you have:
- ✅ Successfully completed 3 training steps on a single node
- ✅ Access to the Slurm login/head node (not inside the interactive container)
- ✅ Weights & Biases API key for experiment tracking
1. Launch Multi-Node Training
Estimated time: Several hours (depending on configuration)
For production training, scale to multiple nodes by changing cluster.num_nodes. This example uses batch mode, where the COMMAND variable specifies what to run automatically when the job starts.
Run this command from the Slurm login/head node, not from inside the interactive container. This submits a new batch job that runs independently.
If you are using enroot following the steps in the Setup doc and downloaded the container locally, use the local container filepath instead:
✅ Success Check: The Slurm job is submitted and begins running on multiple nodes.
2. Monitor Training Progress
Monitor these metrics in W&B to track progress:
The best checkpoint (highest val:accuracy) is retained based on checkpointing.keep_top_k: 3. You can find checkpoints at the following path:
✅ Success Check: Training is successful when:
- Reward mean increases consistently over steps
- Validation accuracy consistently improves
- No OOM (Out of Memory) errors occur
- Checkpoints are saved at specified intervals
3. Measure Real-World Improvement
The Workplace Assistant environment’s tool-calling tasks correlate with performance on the Berkeley Function Calling Leaderboard (BFCL) v3 benchmark. To measure improvement, evaluate the Nemotron Nano v2 9B model on BFCL v3 before and after training, and compare the results. You should observe measurable improvement in tool-calling accuracy.
You can run BFCL v3 evaluations using NeMo Evaluator, which supports BFCL v3. Refer to the NeMo Evaluator docs for full setup instructions and supported benchmarks.
✅ Success Check: BFCL v3 scores improve after training compared to the baseline model.
Next Steps
Congratulations! You’ve trained Nemotron Nano 9B v2 for multi-step tool calling using GRPO on the Workplace Assistant environment.