For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • About
    • Concepts
    • Environment Components
    • Ecosystem
    • Release Notes
  • Get Started
    • Prerequisites
    • Installation
    • Quickstart
  • Agent Server
  • Model Server
    • vLLM
  • Resources Server
  • Data
    • Prepare and Validate
    • Download from Hugging Face
    • Prompt Config
  • Environment Tutorials
    • Single-Step Environment
    • Multi-Step Environment
    • Stateful Environment
    • Real-World Environment
    • Integrate external libraries
    • Add a benchmark
    • Verification Patterns
    • Aggregate Metrics
  • Training Tutorials
    • NeMo RL
      • About Workplace Assistant
      • Gym Configuration
      • Multi-Node Training
      • NeMo RL Configuration
      • Setup
      • Single Node Training
    • Unsloth
    • Multi-Environment Training
    • Training with VeRL
    • Offline Training (SFT/DPO)
  • Model Recipes
    • Nemotron 3 Nano
    • Nemotron 3 Super
  • Infrastructure
    • Deployment Topology
    • Engineering Notes
  • Reference
    • Configuration
    • RL Framework Compatibility
    • CLI Commands
    • FAQ
  • Troubleshooting
    • Configuration Errors
  • Contribute
    • Development Setup
    • Environments
    • Integrate RL Frameworks
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Gym
On this page
  • Prerequisites
  • Configuration File Location
  • Model Configuration
  • GRPO Hyperparameters
  • Optimizer Settings
  • Next Steps
Training TutorialsNeMo RL

NeMo RL Configuration

||View as Markdown|
Previous

Multi-Node Training

Next

Setup

With the Gym configuration in place, the next step is understanding the core training parameters. These control the GRPO algorithm, model behavior, and optimization settings that determine how your model learns.

Goal: Understand the GRPO and model hyperparameters for RL training.

Time: ~10 minutes (read)

In this section, you will learn:

  1. Model configuration parameters
  2. GRPO hyperparameters
  3. Optimizer settings
← Previous: Gym Configuration

Prerequisites

  • Read Gym Configuration to understand the Gym-specific parameters

Configuration File Location

The full training configuration file is located at:

examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml

Model Configuration

ParameterValueDescription
model_namenvidia/NVIDIA-Nemotron-Nano-9B-v2Base model
max_total_sequence_length32768Maximum context length
precisionbfloat16Training precision
tensor_model_parallel_size8Tensor parallelism across GPUs

GRPO Hyperparameters

ParameterValueDescription
num_prompts_per_step4Number of prompts per training step
num_generations_per_prompt4Rollouts generated per prompt
max_num_steps10Total training steps
use_leave_one_out_baselinetrueVariance reduction technique
normalize_rewardstrueNormalize rewards across batch

Optimizer Settings

ParameterValueDescription
optimizerAdamOptimizer type
lr5.0e-6Learning rate
min_lr5.0e-7Minimum learning rate
weight_decay0.01Weight decay
adam_beta1 / adam_beta20.9 / 0.999Adam hyperparameters
clip_grad1.0Gradient clipping threshold

Next Steps

With the configuration parameters understood, set up your training environment:

Continue to Setup →