When NeMo Gym is used for RL training (not standalone rollout collection), it runs alongside a training framework. NeMo Gym’s Model Server acts as an HTTP proxy for policy model inference — it translates between the Responses API and Chat Completions API formats, forwarding requests to the training framework’s generation endpoint (e.g., vLLM). NeMo Gym can also run other models on GPU (e.g., reward models, judge models) through its own resources servers.
This section covers resource requirements, cluster strategies, and how to choose between them. For a detailed integration walkthrough from the training framework side, see how NeMo RL integrated with NeMo Gym. For guidance on integrating a new training framework, see Integrate RL Frameworks.
NeMo Gym and the training framework have different compute profiles:
The deployment strategy depends on how the training framework manages its cluster.
If ray_head_node_address is specified in the config, NeMo Gym connects to that existing Ray cluster instead of starting its own. Training frameworks using Ray set this address so that NeMo Gym attaches to the same cluster.
How it works:
Both systems share a single Ray cluster, so Ray has visibility into all available resources.
Version Requirements
When NeMo Gym connects to an existing Ray cluster, the same Ray and Python versions must be used in both environments.
When the training framework does not use Ray, NeMo Gym spins up its own independent Ray cluster for coordination.
The training framework runs its own orchestration (non-Ray). NeMo Gym spins up a separate Ray cluster.
When to use:
When the training framework and NeMo Gym are not started together (independently deployed), they run on fully separate clusters connected only by HTTP.
When to use:
Requirements:
Understand the server-based architecture.
Implement NeMo Gym integration into a new training framework.
End-to-end GRPO training tutorial with NeMo RL.
Detailed integration architecture from the NeMo RL perspective.