Inference Options#
NemoClaw supports multiple inference providers.
During onboarding, the nemoclaw onboard wizard presents a numbered list of providers to choose from.
Your selection determines where the agent’s inference traffic is routed.
How Inference Routing Works#
The agent inside the sandbox talks to inference.local.
It never connects to a provider directly.
OpenShell intercepts inference traffic on the host and forwards it to the provider you selected.
Provider credentials stay on the host.
The sandbox does not receive your API key.
Local Ollama and local vLLM do not require your host OPENAI_API_KEY.
NemoClaw uses provider-specific local tokens for those routes, and rebuilds of legacy local-inference sandboxes migrate away from stale OpenAI credential requirements.
Provider Status#
Provider |
Status |
Endpoint type |
Notes |
|---|---|---|---|
NVIDIA Endpoints |
Tested |
OpenAI-compatible |
Hosted models on integrate.api.nvidia.com |
OpenAI |
Tested |
Native OpenAI-compatible |
Uses OpenAI model IDs |
Other OpenAI-compatible endpoint |
Tested |
Custom OpenAI-compatible |
For compatible proxies and gateways |
Anthropic |
Tested |
Native Anthropic |
Uses anthropic-messages |
Other Anthropic-compatible endpoint |
Tested |
Custom Anthropic-compatible |
For Claude proxies and compatible gateways |
Google Gemini |
Tested |
OpenAI-compatible |
Uses Google’s OpenAI-compatible endpoint |
Local Ollama |
Caveated |
Local Ollama API |
Available when Ollama is installed or running on the host |
Local NVIDIA NIM |
Experimental |
Local OpenAI-compatible |
Requires |
Local vLLM |
Experimental |
Local OpenAI-compatible |
Requires |
Provider Options#
The onboard wizard presents the following provider options by default. The first six are always available. Ollama appears when it is installed or running on the host.
Option |
Description |
Curated models |
|---|---|---|
NVIDIA Endpoints |
Routes to models hosted on build.nvidia.com. You can also enter any model ID from the catalog. Set |
Nemotron 3 Super 120B, Kimi K2.5, GLM-5.1, MiniMax M2.5, GPT-OSS 120B |
OpenAI |
Routes to the OpenAI API. Set |
|
Other OpenAI-compatible endpoint |
Routes to any server that implements |
You provide the model name. |
Anthropic |
Routes to the Anthropic Messages API. Set |
|
Other Anthropic-compatible endpoint |
Routes to any server that implements the Anthropic Messages API ( |
You provide the model name. |
Google Gemini |
Routes to Google’s OpenAI-compatible endpoint. NemoClaw prefers |
|
Local Ollama |
Routes to a local Ollama instance on |
Selected during onboarding. For more information, refer to Use a Local Inference Server. |
Choosing the Right Option for Nemotron#
NVIDIA Nemotron models expose OpenAI-compatible APIs across every supported deployment surface, so two onboarding options can route to Nemotron.
Where Nemotron is hosted |
Onboard wizard option |
Why |
|---|---|---|
|
Option 1: NVIDIA Endpoints |
NemoClaw sets the base URL to |
Self-hosted NIM container |
Option 3: Other OpenAI-compatible endpoint |
NIM exposes an OpenAI-compatible |
Enterprise NVIDIA AI Enterprise gateway |
Option 3: Other OpenAI-compatible endpoint |
Enterprise gateways front Nemotron with the same OpenAI-compatible contract. Use the gateway’s base URL and your enterprise token. |
vLLM, SGLang, or TRT-LLM serving Nemotron weights |
Option 3: Other OpenAI-compatible endpoint |
Each runtime exposes Nemotron through |
Local NIM started by the wizard |
Local NVIDIA NIM (experimental) |
Requires |
For Option 3, the API key environment variable is COMPATIBLE_API_KEY. Set it to whatever credential your endpoint expects, or any non-empty placeholder if your endpoint does not require auth.
Experimental Options#
The following local inference options require NEMOCLAW_EXPERIMENTAL=1 and, when prerequisites are met, appear in the onboarding selection list.
Option |
Condition |
Notes |
|---|---|---|
Local NVIDIA NIM |
NIM-capable GPU detected |
Pulls and manages a NIM container. |
Local vLLM |
vLLM running on |
Auto-detects the loaded model. |
For setup instructions, refer to Use a Local Inference Server.
Validation#
NemoClaw validates the selected provider and model before creating the sandbox.
If credential validation fails, the wizard asks whether to re-enter the API key, choose a different provider, retry, or exit.
The nvapi- prefix check applies only to NVIDIA_API_KEY.
Other provider credentials, such as OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, and compatible endpoint keys, use provider-aware validation during retry.
Provider type |
Validation method |
|---|---|
OpenAI |
Tries |
NVIDIA Endpoints |
Tries |
Google Gemini |
Tries |
Other OpenAI-compatible endpoint |
Tries |
Anthropic-compatible |
Tries |
NVIDIA Endpoints (manual model entry) |
Validates the model name against the catalog API. |
Compatible endpoints |
Sends a real inference request because many proxies do not expose a |
Local NVIDIA NIM |
Uses the same validation behavior as NVIDIA Endpoints and skips the |
Next Steps#
Use a Local Inference Server for Ollama, vLLM, NIM, and compatible-endpoint setup details.
Switch Inference Models for changing the model at runtime without re-onboarding.