With the Gym configuration in place, the next step is understanding the core training parameters. These control the GRPO algorithm, model behavior, and optimization settings that determine how your model learns.
Goal: Understand the GRPO and model hyperparameters for RL training.
Time: ~10 minutes (read)
In this section, you will learn:
The full training configuration file is located at:
With the configuration parameters understood, set up your training environment:
Continue to Setup →