Inference Providers
The inference_provider server connects NeMo Gym to any hosted inference provider. The server manages the conversion to and from the Responses API: it translates incoming Responses requests to Chat Completions for the provider and converts the reply back into a Responses object — so your agent code stays the same across backends.
For training workloads that require token IDs and log probabilities, use vLLM instead. Hosted providers do not expose the token-level information needed for RL training.
Supported APIs
- OpenAI Responses —
/v1/responses - OpenAI Chat Completions —
/v1/chat/completions
Supported Providers
All provider configs live in responses_api_models/inference_provider/configs/.
Set Your Credentials
You need an API key and a model name from your provider; store values in env.yaml in the project root (gitignored):
If your provider is not in the table above, also set the base URL:
Configuration Reference
The model is fixed by configuration. This server always sends the configured model (from policy_model_name) to the provider. If an incoming request carries its own model field — as standard OpenAI-compatible clients and SDKs do — that value is overwritten, so you cannot switch models on a per-request basis. To run a different model, change the config and start a new server.
Usage Example
1. Set model and environment config
In this example, we use Together.ai:
For unlisted providers, use the generic config: responses_api_models/inference_provider/configs/inference_provider.yaml