For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Documentation
    • Home
  • About
    • Concepts
    • Ecosystem
  • Get Started
    • Quickstart
    • Detailed Setup Guide
    • Install from PyPI
    • Rollout Collection
  • Agent Server
  • Model Server
    • vLLM
  • Resources Server
  • Data
    • Prepare and Validate
    • Download from Hugging Face
    • Prompt Config
  • Environment Tutorials
    • Single-Step Environment
    • Multi-Step Environment
    • Stateful Environment
    • Real-World Environment
    • Integrate external libraries
    • Aggregate Metrics
    • LLM-as-Judge Verification
  • Benchmarks
    • Run benchmarks
    • Add a benchmark
    • Design a customer evaluation
  • Training Tutorials
    • NeMo RL
    • Unsloth
    • Multi-Environment Training
    • Offline Training (SFT/DPO)
  • Model Recipes
    • Nemotron 3 Nano
    • Nemotron 3 Super
  • Infrastructure
    • Deployment Topology
    • Engineering Notes
  • Reference
    • Configuration
    • RL Framework Compatibility
    • CLI Commands
    • FAQ
  • Troubleshooting
    • Configuration Errors
  • Contribute
    • Development Setup
    • Environments
    • Integrate RL Frameworks
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Gym
On this page
  • Training Framework Deployment
  • Resource Requirements
  • Cluster Co-location Strategy
  • Single Ray Cluster
  • NeMo Gym’s Own Ray Cluster
  • Separate Clusters
  • Choosing a Deployment Strategy
  • Related Guides
Infrastructure

Deployment Topology

||View as Markdown|
Previous

Overview

Next

Engineering Notes

Training Framework Deployment

When NeMo Gym is used for RL training (not standalone rollout collection), it runs alongside a training framework. NeMo Gym’s Model Server acts as an HTTP proxy for policy model inference — it translates between the Responses API and Chat Completions API formats, forwarding requests to the training framework’s generation endpoint (e.g., vLLM). NeMo Gym can also run other models on GPU (e.g., reward models, judge models) through its own resources servers.

This section covers resource requirements, cluster strategies, and how to choose between them. For a detailed integration walkthrough from the training framework side, see how NeMo RL integrated with NeMo Gym. For guidance on integrating a new training framework, see Integrate RL Frameworks.

Resource Requirements

NeMo Gym and the training framework have different compute profiles:

ComponentComputeRole
NeMo GymCPU by defaultOrchestrates rollouts, executes tools, computes rewards. Some resources servers may use GPUs (e.g., running local reward or judge models via vLLM).
Training framework (e.g., NeMo RL)GPUHolds model weights, runs policy training, serves inference via an OpenAI-compatible HTTP endpoint (e.g., vLLM)

Cluster Co-location Strategy

The deployment strategy depends on how the training framework manages its cluster.

Single Ray Cluster

If ray_head_node_address is specified in the config, NeMo Gym connects to that existing Ray cluster instead of starting its own. Training frameworks using Ray set this address so that NeMo Gym attaches to the same cluster.

How it works:

  1. The training framework initializes the Ray cluster and creates vLLM workers with HTTP servers
  2. The training framework creates a NeMo Gym Ray actor within the same cluster
  3. The NeMo Gym actor spawns NeMo Gym servers (Head, Agent, Model, Resources) as subprocesses
  4. NeMo Gym’s Model Server proxies inference requests to the training framework’s vLLM HTTP endpoints
  5. Results flow back through the actor to the training loop

Both systems share a single Ray cluster, so Ray has visibility into all available resources.

Version Requirements

When NeMo Gym connects to an existing Ray cluster, the same Ray and Python versions must be used in both environments.


NeMo Gym’s Own Ray Cluster

When the training framework does not use Ray, NeMo Gym spins up its own independent Ray cluster for coordination.

The training framework runs its own orchestration (non-Ray). NeMo Gym spins up a separate Ray cluster.

When to use:

  • The training framework has its own orchestration (not Ray-based)
  • You still want NeMo Gym’s HTTP-based rollout collection
  • The generation backend exposes OpenAI-compatible HTTP endpoints that NeMo Gym can reach

Separate Clusters

When the training framework and NeMo Gym are not started together (independently deployed), they run on fully separate clusters connected only by HTTP.

When to use:

  • Training framework and NeMo Gym are deployed independently by different teams
  • Clusters have different lifecycle requirements (e.g., NeMo Gym always on, training runs are transient)
  • Network security policies require isolation between training and environment infrastructure
  • Hybrid cloud setups where training runs on GPU cloud and environments run on CPU cloud

Requirements:

  • The training cluster must expose its generation backend (e.g., vLLM) as HTTP endpoints reachable from the NeMo Gym cluster
  • Network connectivity and firewall rules between clusters must allow HTTP traffic on the configured ports

Choosing a Deployment Strategy

FactorSingle Ray ClusterNeMo Gym’s Own Ray ClusterSeparate Clusters
Training frameworkRay-basedNon-RayAny
StartupCo-launchedIndependentIndependent
Resource visibilityUnifiedSeparateSeparate
Network requirementsIntra-clusterIntra-clusterCross-cluster HTTP

Related Guides

Architecture Overview

Understand the server-based architecture.

Integrate RL Frameworks

Implement NeMo Gym integration into a new training framework.

NeMo RL GRPO Training

End-to-end GRPO training tutorial with NeMo RL.

NeMo RL Integration (RL-side)

Detailed integration architecture from the NeMo RL perspective.