Integration Footprint#

This page provides a reference for the components required to integrate Gym into your training framework. Each component includes links to the NeMo RL reference implementation and corresponding tests.

Integration Components#

A complete Gym integration consists of five components, implemented in sequence:

Component

Implementation

Tests

1

OpenAI-Compatible HTTP Server

vllm_worker_async.py:264

test_vllm_generation.py:1107

2

On-Policy Token ID Fixes

vllm_worker_async.py:40

test_vllm_generation.py:1250

3

Gym Spinup and Integration

nemo_gym.py

test_nemo_gym.py

4

Rollout Orchestration

rollouts.py:975

test_rollouts.py:754

5

GRPO Train Loop Integration

grpo.py:1157

End-to-end tests in progress

Note

As of December 8, 2025, end-to-end tests for GRPO train loop integration are still being implemented in the NeMo RL repository.

Component Details#

1. OpenAI-Compatible HTTP Server#

Purpose: Expose your generation backend as an OpenAI-compatible endpoint.

Prerequisites: vLLM or SGLang generation backend.

Reference: Refer to Generation Backend for implementation guidance.

2. On-Policy Token ID Fixes#

Purpose: Prevent train-generation mismatch in multi-step and multi-turn scenarios.

Prerequisites: OpenAI-compatible HTTP server.

Reference: Refer to On-Policy Corrections for technical details.

3. Gym Spinup and Integration#

Purpose: Initialize and connect to Gym training environments.

Key responsibilities:

  • Environment configuration loading

  • Connection management

  • State synchronization

4. Rollout Orchestration#

Purpose: Coordinate rollout collection between the policy and Gym environments.

Key responsibilities:

  • Batch rollout management

  • Multi-step and multi-turn handling

  • Token ID tracking for on-policy corrections

5. GRPO Train Loop Integration#

Purpose: Integrate Gym rollouts into the policy optimization training loop.

Key responsibilities:

  • Rollout scheduling within training iterations

  • Loss calculation with Gym-generated experiences

  • Weight synchronization between training and generation

Implementation Checklist#

Use this checklist to track your integration progress:

  • OpenAI-compatible HTTP server implemented and tested

  • On-policy token ID fixes implemented and tested

  • Gym spinup and environment connection working

  • Rollout orchestration handling multi-step/multi-turn scenarios

  • GRPO (or equivalent) train loop integration complete