NeMo RL Configuration#
With the Gym configuration in place, the next step is understanding the core training parameters. These control the GRPO algorithm, model behavior, and optimization settings that determine how your model learns.
Goal: Understand the GRPO and model hyperparameters for RL training.
In this section, you will learn:
Model configuration parameters
GRPO hyperparameters
Optimizer settings
Configuration File Location#
The full training configuration file is located at:
examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml
Model Configuration#
Parameter |
Value |
Description |
|---|---|---|
|
nvidia/NVIDIA-Nemotron-Nano-9B-v2 |
Base model |
|
32768 |
Maximum context length |
|
bfloat16 |
Training precision |
|
8 |
Tensor parallelism across GPUs |
GRPO Hyperparameters#
Parameter |
Value |
Description |
|---|---|---|
|
4 |
Number of prompts per training step |
|
4 |
Rollouts generated per prompt |
|
10 |
Total training steps |
|
true |
Variance reduction technique |
|
true |
Normalize rewards across batch |
Optimizer Settings#
Parameter |
Value |
Description |
|---|---|---|
|
Adam |
Optimizer type |
|
5.0e-6 |
Learning rate |
|
5.0e-7 |
Minimum learning rate |
|
0.01 |
Weight decay |
|
0.9 / 0.999 |
Adam hyperparameters |
|
1.0 |
Gradient clipping threshold |