Integration Footprint#
This page provides a reference for the components required to integrate Gym into your training framework. Each component includes links to the NeMo RL reference implementation and corresponding tests.
Integration Components#
A complete Gym integration consists of five components, implemented in sequence:
Component |
Implementation |
Tests |
|
|---|---|---|---|
1 |
OpenAI-Compatible HTTP Server |
||
2 |
On-Policy Token ID Fixes |
||
3 |
Gym Spinup and Integration |
||
4 |
Rollout Orchestration |
||
5 |
GRPO Train Loop Integration |
End-to-end tests in progress |
Note
As of December 8, 2025, end-to-end tests for GRPO train loop integration are still being implemented in the NeMo RL repository.
Component Details#
1. OpenAI-Compatible HTTP Server#
Purpose: Expose your generation backend as an OpenAI-compatible endpoint.
Prerequisites: vLLM or SGLang generation backend.
Reference: Refer to Generation Backend for implementation guidance.
2. On-Policy Token ID Fixes#
Purpose: Prevent train-generation mismatch in multi-step and multi-turn scenarios.
Prerequisites: OpenAI-compatible HTTP server.
Reference: Refer to On-Policy Corrections for technical details.
3. Gym Spinup and Integration#
Purpose: Initialize and connect to Gym training environments.
Key responsibilities:
Environment configuration loading
Connection management
State synchronization
4. Rollout Orchestration#
Purpose: Coordinate rollout collection between the policy and Gym environments.
Key responsibilities:
Batch rollout management
Multi-step and multi-turn handling
Token ID tracking for on-policy corrections
5. GRPO Train Loop Integration#
Purpose: Integrate Gym rollouts into the policy optimization training loop.
Key responsibilities:
Rollout scheduling within training iterations
Loss calculation with Gym-generated experiences
Weight synchronization between training and generation
Implementation Checklist#
Use this checklist to track your integration progress:
OpenAI-compatible HTTP server implemented and tested
On-policy token ID fixes implemented and tested
Gym spinup and environment connection working
Rollout orchestration handling multi-step/multi-turn scenarios
GRPO (or equivalent) train loop integration complete