Inference Profiles#
NemoClaw ships with three inference profiles defined in blueprint.yaml.
Each profile configures an OpenShell inference provider and model route.
The agent inside the sandbox uses whichever profile is active.
Inference requests are routed transparently through the OpenShell gateway.
Profile Summary#
Profile |
Provider |
Model |
Endpoint |
Use Case |
|---|---|---|---|---|
|
NVIDIA cloud |
|
|
Production. Requires an NVIDIA API key. |
|
Local NIM service |
|
|
On-premises. NIM deployed as a local pod. |
|
vLLM |
|
|
Local development. vLLM on the host. |
Available Models#
The nvidia-nim provider registers the following models from build.nvidia.com:
Model ID |
Label |
Context Window |
Max Output |
|---|---|---|---|
|
Nemotron 3 Super 120B |
131,072 |
8,192 |
|
Nemotron Ultra 253B |
131,072 |
4,096 |
|
Nemotron Super 49B v1.5 |
131,072 |
4,096 |
|
Nemotron 3 Nano 30B |
131,072 |
4,096 |
The default and nim-local profiles use Nemotron 3 Super 120B.
The vllm profile uses Nemotron 3 Nano 30B.
You can switch to any model in the catalog at runtime.
default – NVIDIA Cloud#
The default profile routes inference to NVIDIA’s hosted API through build.nvidia.com.
Provider type:
nvidiaEndpoint:
https://integrate.api.nvidia.com/v1Model:
nvidia/nemotron-3-super-120b-a12bCredential:
NVIDIA_API_KEYenvironment variable
Get an API key from build.nvidia.com.
The nemoclaw setup command prompts for this key and stores it in ~/.nemoclaw/credentials.json.
$ openshell inference set --provider nvidia-nim --model nvidia/nemotron-3-super-120b-a12b
nim-local – Local NIM Service#
Routes inference to a NIM container running on the local network.
Provider type:
openai, which uses the OpenAI-compatible APIEndpoint:
http://nim-service.local:8000/v1Model:
nvidia/nemotron-3-super-120b-a12bCredential:
NIM_API_KEYenvironment variable
The sandbox network policy includes a nim_service entry that allows traffic to nim-service.local:8000.
$ openshell inference set --provider nim-local --model nvidia/nemotron-3-super-120b-a12b
vllm – Local vLLM#
Routes inference to a vLLM server running on the host and exposed to the OpenShell gateway.
Provider type:
openai, which uses the OpenAI-compatible APIEndpoint:
http://host.openshell.internal:8000/v1Model:
nvidia/nemotron-3-nano-30b-a3bCredential:
OPENAI_API_KEYenvironment variable. Defaults todummyfor local use.
Start the vLLM server with a non-loopback bind address such as 0.0.0.0.
On Linux hosts with UFW enabled, allow the Docker bridge subnet to reach port 8000.
$ openshell inference set --provider vllm-local --model nvidia/nemotron-3-nano-30b-a3b
Selecting a Profile at Launch#
Pass --profile when launching to select a profile:
$ openclaw nemoclaw launch --profile vllm
Switching Profiles at Runtime#
After the sandbox is running, switch inference providers with the OpenShell CLI:
$ openshell inference set --provider <provider-name> --model <model-name>
The change takes effect immediately. No sandbox restart is needed.