This page provides a reference for the components required to integrate Gym into your training framework. Each component includes links to the NeMo RL reference implementation and corresponding tests.
A complete Gym integration consists of five components, implemented in sequence:
As of December 8, 2025, end-to-end tests for GRPO train loop integration are still being implemented in the NeMo RL repository.
Purpose: Expose your generation backend as an OpenAI-compatible endpoint.
Prerequisites: vLLM or SGLang generation backend.
Reference: Refer to Generation Backend And Openai Compatible Http Server for implementation guidance.
Purpose: Prevent train-generation mismatch in multi-step and multi-turn scenarios.
Prerequisites: OpenAI-compatible HTTP server.
Reference: Refer to On-Policy Corrections for technical details.
Purpose: Initialize and connect to Gym training environments.
Key responsibilities:
Purpose: Coordinate rollout collection between the policy and Gym environments.
Key responsibilities:
Purpose: Integrate Gym rollouts into the policy optimization training loop.
Key responsibilities:
Use this checklist to track your integration progress: