Inference Providers

View as Markdown

The inference_provider server connects NeMo Gym to any hosted inference provider. The server manages the conversion to and from the Responses API: it translates incoming Responses requests to Chat Completions for the provider and converts the reply back into a Responses object — so your agent code stays the same across backends.

For training workloads that require token IDs and log probabilities, use vLLM instead. Hosted providers do not expose the token-level information needed for RL training.

Supported APIs

  • OpenAI Responses/v1/responses
  • OpenAI Chat Completions/v1/chat/completions

Supported Providers

All provider configs live in responses_api_models/inference_provider/configs/.

ProviderConfigBase URL
Basetenbaseten.yamlhttps://inference.baseten.co/v1
DeepInfradeepinfra.yamlhttps://api.deepinfra.com/v1/openai
Fireworksfireworks.yamlhttps://api.fireworks.ai/inference/v1
Friendlifriendli.yamlhttps://api.friendli.ai/serverless/v1
Geminigemini.yamlhttps://generativelanguage.googleapis.com/v1beta/openai
HF Inferencehf_inference.yamlhttps://router.huggingface.co/v1
Nebiusnebius.yamlhttps://api.tokenfactory.nebius.com/v1/
OpenRouteropenrouter.yamlhttps://openrouter.ai/api/v1
Together.aitogether.yamlhttps://api.together.xyz/v1

Set Your Credentials

You need an API key and a model name from your provider; store values in env.yaml in the project root (gitignored):

1policy_api_key: your-api-key
2policy_model_name: nvidia/Nemotron-3-Nano-30B-A3B

If your provider is not in the table above, also set the base URL:

1# env.yaml
2policy_base_url: https://your-provider.com/v1

Configuration Reference

ParameterTypeDefaultDescription
base_urlstrRequired. Provider’s OpenAI-compatible API base URL.
api_keystrRequired. Provider API key.
modelstrRequired. Model identifier (provider-specific format).
uses_reasoning_parserboolfalseParse <think> tags and reasoning_content fields from thinking models.
num_concurrent_requestsint1000Maximum concurrent requests to the provider. Reduce if your provider has a lower rate limit.
extra_bodydict{}Additional parameters merged into every request body.

The model is fixed by configuration. This server always sends the configured model (from policy_model_name) to the provider. If an incoming request carries its own model field — as standard OpenAI-compatible clients and SDKs do — that value is overwritten, so you cannot switch models on a per-request basis. To run a different model, change the config and start a new server.

Usage Example

1. Set model and environment config

In this example, we use Together.ai:

$environment_config="resources_servers/mcqa/configs/mcqa.yaml"
$model_config="responses_api_models/inference_provider/configs/together.yaml"

For unlisted providers, use the generic config: responses_api_models/inference_provider/configs/inference_provider.yaml

2. Start servers

$ng_run "+config_paths=[${environment_config},${model_config}]"

3. Evaluate your agent

$ng_collect_rollouts +agent_name=mcqa_simple_agent \
> +input_jsonl_fpath=resources_servers/mcqa/data/example.jsonl \
> +output_jsonl_fpath=results/mcqa_rollouts.jsonl