Nemotron-3-Super GRPO/DAPO Training#

This directory contains Reinforcement Learning (RL) training assets for the Nemotron-3-Super-120B-A20B model.

Overview#

Full weight RL training with GRPO/DAPO algorithm using NeMo RL from a base model: grpo_training_cookbook.ipynb

This experiment reproduces the so-called Deepseek “aha” moment. The GRPO/DAPO training process can help a model fresh out of pretraining discover advanced math reasoning entirely by itself.

Requirements#

5x GB200 nodes on the same GB200 rack (4xGPUs each, i.e. 20x GB200 189GB GPUs in total), or
3x B200 nodes (8xGPUs each, i.e. 24x B200 183GB GPUs in total)