> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/gym/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/gym/llms-full.txt.

# NeMo RL Configuration

With the Gym configuration in place, the next step is understanding the core training parameters. These control the GRPO algorithm, model behavior, and optimization settings that determine how your model learns.

<Info>
  **Goal**: Understand the GRPO and model hyperparameters for RL training.

  **Time**: \~10 minutes (read)

  **In this section, you will learn**:

  1. Model configuration parameters
  2. GRPO hyperparameters
  3. Optimizer settings
</Info>

<NavButton href="/latest/training-tutorials/nemo-rl-grpo/gym-configuration" label="Previous: Gym Configuration" direction="prev" />

## Prerequisites

* Read [Gym Configuration](/latest/training-tutorials/nemo-rl-grpo/gym-configuration) to understand the Gym-specific parameters

***

## Configuration File Location

The full training configuration file is located at:

```
examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml
```

***

## Model Configuration

| Parameter                    | Value                             | Description                    |
| ---------------------------- | --------------------------------- | ------------------------------ |
| `model_name`                 | nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Base model                     |
| `max_total_sequence_length`  | 32768                             | Maximum context length         |
| `precision`                  | bfloat16                          | Training precision             |
| `tensor_model_parallel_size` | 8                                 | Tensor parallelism across GPUs |

***

## GRPO Hyperparameters

| Parameter                    | Value | Description                         |
| ---------------------------- | ----- | ----------------------------------- |
| `num_prompts_per_step`       | 4     | Number of prompts per training step |
| `num_generations_per_prompt` | 4     | Rollouts generated per prompt       |
| `max_num_steps`              | 10    | Total training steps                |
| `use_leave_one_out_baseline` | true  | Variance reduction technique        |
| `normalize_rewards`          | true  | Normalize rewards across batch      |

***

## Optimizer Settings

| Parameter                   | Value       | Description                 |
| --------------------------- | ----------- | --------------------------- |
| `optimizer`                 | Adam        | Optimizer type              |
| `lr`                        | 5.0e-6      | Learning rate               |
| `min_lr`                    | 5.0e-7      | Minimum learning rate       |
| `weight_decay`              | 0.01        | Weight decay                |
| `adam_beta1` / `adam_beta2` | 0.9 / 0.999 | Adam hyperparameters        |
| `clip_grad`                 | 1.0         | Gradient clipping threshold |

***

## Next Steps

With the configuration parameters understood, set up your training environment:

<NavButton href="/latest/training-tutorials/nemo-rl-grpo/setup" label="Continue to Setup" direction="next" />