NeMo RL Configuration#

With the Gym configuration in place, the next step is understanding the core training parameters. These control the GRPO algorithm, model behavior, and optimization settings that determine how your model learns.

Goal: Understand the GRPO and model hyperparameters for RL training.

In this section, you will learn:

  1. Model configuration parameters

  2. GRPO hyperparameters

  3. Optimizer settings

← Previous: Gym Configuration


Configuration File Location#

The full training configuration file is located at:

examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml

Model Configuration#

Parameter

Value

Description

model_name

nvidia/NVIDIA-Nemotron-Nano-9B-v2

Base model

max_total_sequence_length

32768

Maximum context length

precision

bfloat16

Training precision

tensor_model_parallel_size

8

Tensor parallelism across GPUs


GRPO Hyperparameters#

Parameter

Value

Description

num_prompts_per_step

4

Number of prompts per training step

num_generations_per_prompt

4

Rollouts generated per prompt

max_num_steps

10

Total training steps

use_leave_one_out_baseline

true

Variance reduction technique

normalize_rewards

true

Normalize rewards across batch


Optimizer Settings#

Parameter

Value

Description

optimizer

Adam

Optimizer type

lr

5.0e-6

Learning rate

min_lr

5.0e-7

Minimum learning rate

weight_decay

0.01

Weight decay

adam_beta1 / adam_beta2

0.9 / 0.999

Adam hyperparameters

clip_grad

1.0

Gradient clipping threshold


Next: Setup →