Detailed Setup Guide
Goal: Get NeMo Gym installed and servers running, then verify all components work together.
Time: ~15 minutes | Cost: ~$0.05 (OpenAI API)
In this tutorial, you will:
- Clone the repository and install dependencies
- Configure your OpenAI API key
- Start the NeMo Gym servers
- Test the setup
Requirements
Hardware Requirements
NeMo Gym is designed to run on standard development machines without specialized hardware:
- GPU: Not required for NeMo Gym library operation
- GPU may be needed for specific resources servers or model inference (see individual server documentation). E.g. if you are intending to train your model with NeMo-RL, GPU resources are required (see training documentation)
- CPU: Any modern x86_64 or ARM64 processor (e.g., Intel, AMD, Apple Silicon)
- RAM: Minimum 8 GB (16 GB+ recommended for larger environments and datasets)
- Storage: Minimum 2 GB free disk space for installation and basic usage
Software Requirements
- Operating System:
- Linux (Ubuntu 20.04+, CentOS 7+, or equivalent)
- macOS (11.0+ for x86_64, 12.0+ for Apple Silicon)
- Windows via WSL2 (Ubuntu 20.04+ recommended)
- Python: 3.12 or higher (required)
- Git: For cloning the repository
- curl or wget: For installing the UV package manager
- Internet Connection: Required for:
- Downloading dependencies
- Accessing model APIs (OpenAI, Azure, etc.)
- Downloading datasets
Additional Requirements
- API Keys: Model provider access
- OpenAI API key with available credits (for quickstart and most examples)
- OR Azure OpenAI credentials
- OR self-hosted model setup (via vLLM or compatible inference server)
- Ray: Automatically installed as a dependency for distributed processing (no separate setup required)
Verified Configurations
The following configurations have been tested and verified:
While NeMo Gym itself does not require a GPU, some resources servers (particularly those involving local model inference or training) may have GPU requirements. Check the individual resources server documentation for specific requirements.
Prerequisites
Make sure you have these prerequisites ready before beginning:
- Git (for cloning the repository)
- OpenAI API key with available credits (for the tutorial agent)
1. Clone and Install
Clone the NeMo Gym repository and install dependencies:
SSH (recommended)
HTTPS
✅ Success Check: Verify that you can see something that indicates a newly activated environment such as (.venv) or (NeMo-Gym) in your terminal prompt.
2. Configure Your API Key
Create an env.yaml file in the project root to configure your Policy Model credentials:
Using terminal
Create manually
Create a file named env.yaml in the Gym/ directory with this content:
Replace sk-your-actual-openai-api-key-here with your real OpenAI API key. This file keeps secrets out of version control while making them available to NeMo Gym.
Requirements:
- Your API key must have available credits (check OpenAI billing 🔗)
- The model must support function calling (most GPT-4 models do)
- Refer to OpenAI’s models documentation 🔗 for available models
Refer to Configuration for additional env.yaml options.
Why GPT-4.1? We use GPT-4.1 for getting started because it provides low latency (no reasoning step) and reliable function calling support out-of-the-box, letting you focus on learning NeMo Gym without configuration complexity.
Can I use my own model? Yes! NeMo Gym works with any OpenAI-compatible inference server that supports function calling:
- Self-hosted models: Use vLLM to serve your own models (see the model-server-vllm)
- Other providers: Any inference server that implements the OpenAI API specification
Simply update policy_base_url, policy_api_key, and policy_model_name in your env.yaml to point to your chosen endpoint.
Optional: Validate your API key before proceeding
Want to catch configuration issues early? Test your API key before starting servers:
✅ Success Check: Verify that you can see “API key validated successfully!” and a response from the model.
If this step fails, you will see a clear error message (like quota exceeded or invalid key) before investing time in server setup.
Troubleshooting: 'Missing mandatory value: policy_api_key'
Check your env.yaml file has the correct API key format.
The cost for running rollouts using the OpenAI API can be calculated using the following rough formula: per token API cost × average number of input/output tokens × num_repeats × limit.
- Per token API cost: See the OpenAI API pricing for more details https://openai.com/api/pricing/.
- Average number of input/output tokens: After rollouts are run, you can see the input/output token usage in the returned response.
- Num repeats and limit: These parameters are set in the rollout collection command later.
3. Start the Servers
✅ Success Check: Verify that you can see output like:
The head server always uses port 11000. Other servers get automatically assigned ports (like 62920, 52341, etc.) - your port numbers will differ from the example above.
Finding your server ports: Query the head server to see all registered servers and their assigned ports:
When you ran ng_run, it started all the servers you configured:
- Head server: coordinating all components
- Resources server: defining tools and verification
- Model server: providing LLM inference
- Agent server: orchestrating how the model interacts with the resources. The agent server calls the model and resources servers using REST requests.
Stopping servers: Press Ctrl+C in the terminal running ng_run to stop all servers.
Troubleshooting: 'command not found: ng_run'
Make sure you activated the virtual environment:
4. Test the Setup
Open a new terminal (keep servers running in the first one).
Before running the test, make sure you:
cd /path/to/Gym— navigate to the project directorysource .venv/bin/activate— activate the virtual environment
✅ Success Check: Verify that you can see JSON output showing:
- Agent calling the weather tool
- Weather tool returning data
- Agent responding to the user
Example output:
Troubleshooting: 'python: command not found'
Try python3 instead of python, or check your virtual environment.
Troubleshooting: No output from client script
Make sure the servers are still running in the other terminal.
Troubleshooting: OpenAI API errors or '500 Internal Server Error'
If you encounter errors when running the client, check these common causes:
Quota/billing errors (most common):
- Solution: Add credits to your OpenAI account at platform.openai.com/account/billing 🔗
- The tutorial requires minimal credits (~$0.01-0.05 per run)
Invalid API key:
- Solution: Verify your API key in
env.yamlmatches your OpenAI API keys 🔗 - Ensure no extra quotes or spaces around the key
Model access errors:
- Solution: Ensure your account has access to the model specified in
policy_model_name - Try using
gpt-4oorgpt-4-turboifgpt-4.1-2025-04-14isn’t available
Testing your API key:
File Structure After Setup
Your directory should look like this:
Next Steps
You’ve confirmed that NeMo Gym is working — the agent can call tools and return results. But a single interaction isn’t enough for RL training. The next step is to collect batches of scored interactions (rollouts) that become your training data.
Continue to Rollout Collection →Rollout collection is required before proceeding to tutorials like Environment Tutorials or training workflows. Complete it next.