NeMo RL Configuration

View as Markdown

With the Gym configuration in place, the next step is understanding the core training parameters. These control the GRPO algorithm, model behavior, and optimization settings that determine how your model learns.

Goal: Understand the GRPO and model hyperparameters for RL training.

Time: ~10 minutes (read)

In this section, you will learn:

  1. Model configuration parameters
  2. GRPO hyperparameters
  3. Optimizer settings
← Previous: Gym Configuration

Prerequisites


Configuration File Location

The full training configuration file is located at:

examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml

Model Configuration

ParameterValueDescription
model_namenvidia/NVIDIA-Nemotron-Nano-9B-v2Base model
max_total_sequence_length32768Maximum context length
precisionbfloat16Training precision
tensor_model_parallel_size8Tensor parallelism across GPUs

GRPO Hyperparameters

ParameterValueDescription
num_prompts_per_step4Number of prompts per training step
num_generations_per_prompt4Rollouts generated per prompt
max_num_steps10Total training steps
use_leave_one_out_baselinetrueVariance reduction technique
normalize_rewardstrueNormalize rewards across batch

Optimizer Settings

ParameterValueDescription
optimizerAdamOptimizer type
lr5.0e-6Learning rate
min_lr5.0e-7Minimum learning rate
weight_decay0.01Weight decay
adam_beta1 / adam_beta20.9 / 0.999Adam hyperparameters
clip_grad1.0Gradient clipping threshold

Next Steps

With the configuration parameters understood, set up your training environment:

Continue to Setup →