Switch Inference Models at Runtime

Change the active inference model while the sandbox is running. You do not need to restart the sandbox.

Prerequisites

A running NemoClaw sandbox.
The OpenShell CLI on your PATH, which NemoClaw uses under the hood.

Switch to a Different Model

Use nemohermes inference set with the provider and model that match the upstream you want to use. The command updates the OpenShell inference route and synchronizes the running agent config. For Hermes, it updates /sandbox/.hermes/config.yaml (model.default, model.base_url, and model.provider: custom) without rebuilding or restarting Hermes. Pass --sandbox <name> when you do not want to use the default registered sandbox. Under nemohermes, pass --sandbox <name> when you have registered more than one Hermes sandbox.

NVIDIA Endpoints

$ nemohermes inference set --provider nvidia-prod --model nvidia/nemotron-3-super-120b-a12b

OpenAI

$ nemohermes inference set --provider openai-api --model gpt-5.4

Anthropic

$ nemohermes inference set --provider anthropic-prod --model claude-sonnet-4-6

Google Gemini

$ nemohermes inference set --provider gemini-api --model gemini-2.5-flash

Compatible Endpoints

If you onboarded a custom compatible endpoint, switch models with the provider created for that endpoint:

$ nemohermes inference set --provider compatible-endpoint --model <model-name>

$ nemohermes inference set --provider compatible-anthropic-endpoint --model <model-name>

Hermes Provider

For a NemoClaw-managed Hermes sandbox, use the Hermes alias with the registered Hermes Provider route:

$ nemohermes inference set --provider hermes-provider --model openai/gpt-5.4-mini

Switching from Responses API to Chat Completions

If onboarding selected /v1/responses but the agent fails at runtime, re-run onboarding so the wizard re-probes the endpoint and bakes the correct API path into the image. This can happen when the backend does not emit the streaming events OpenClaw requires.

$ nemohermes onboard

Select the same provider and endpoint again. The updated streaming probe detects incomplete /v1/responses support and selects /v1/chat/completions automatically.

For the compatible-endpoint provider, NemoClaw uses /v1/chat/completions by default, so you do not need an environment variable to keep the safe path. To opt in to /v1/responses for a backend you have verified end to end, set NEMOCLAW_PREFERRED_API before onboarding:

$ NEMOCLAW_PREFERRED_API=openai-responses nemohermes onboard

NEMOCLAW_INFERENCE_API_OVERRIDE patches the config at container startup but does not update the Dockerfile ARG baked into the image. If you recreate the sandbox without the override environment variable, the image reverts to the original API path. A fresh nemohermes onboard is the reliable fix because it updates both the session and the baked image.

Cross-Provider Switching

Switching to a different provider family (for example, from NVIDIA Endpoints to Anthropic) also uses nemohermes inference set. The command updates both the gateway route and /sandbox/.hermes/config.yaml. If the Hermes config sync fails after the gateway route is updated, NemoClaw keeps the host registry aligned with the gateway and prints a rebuild hint. Run the rebuild before relying on the running agent if the warning says the image config could not be patched.

$ nemohermes inference set --provider anthropic-prod --model claude-sonnet-4-6 --no-verify

Use --no-verify only when OpenShell cannot verify the provider at switch time but you have already confirmed the provider and credential.

Tune Model Metadata

The sandbox image bakes model metadata (context window, max output tokens, reasoning mode, and accepted input modalities) into openclaw.json at build time. To change these values, set the corresponding environment variables before running nemohermes onboard so they patch into the Dockerfile before the image builds.

Variable	Values	Default
`NEMOCLAW_CONTEXT_WINDOW`	Positive integer (tokens)	`131072`
`NEMOCLAW_MAX_TOKENS`	Positive integer (tokens)	`4096`
`NEMOCLAW_REASONING`	`true` or `false`	`false`
`NEMOCLAW_INFERENCE_INPUTS`	`text` or `text,image`	`text`
`NEMOCLAW_AGENT_TIMEOUT`	Positive integer (seconds)	`600`
`NEMOCLAW_AGENT_HEARTBEAT_EVERY`	Go-style duration (`30m`, `1h`, `0m` to disable)	`unset` (OpenClaw default)

NemoClaw ignores invalid values and bakes the default into the image. For Local Ollama, onboarding loads the selected model first and uses Ollama’s reported runtime context length when NEMOCLAW_CONTEXT_WINDOW is unset. For local vLLM, onboarding uses the runtime max_model_len value when the server reports one and NEMOCLAW_CONTEXT_WINDOW is unset. Use NEMOCLAW_INFERENCE_INPUTS=text,image only for a model that accepts image input through the selected provider.

$ export NEMOCLAW_CONTEXT_WINDOW=65536
$ export NEMOCLAW_MAX_TOKENS=8192
$ export NEMOCLAW_REASONING=true
$ export NEMOCLAW_INFERENCE_INPUTS=text,image
$ export NEMOCLAW_AGENT_TIMEOUT=1800
$ export NEMOCLAW_AGENT_HEARTBEAT_EVERY=0m
$ nemohermes onboard

NEMOCLAW_AGENT_TIMEOUT controls the per-request inference timeout baked into the Hermes sandbox image. Increase it for slow local inference, such as CPU-only Ollama or vLLM on modest hardware. Direct in-sandbox edits are not the supported or durable way to change NemoClaw-managed defaults. Rebuild the sandbox with nemohermes onboard to apply a new value.

Hermes does not use OpenClaw’s HEARTBEAT.md wake-up mechanism. Rebuild the sandbox with nemohermes onboard --resume to apply build-time inference metadata changes.

These variables are build-time settings. If you change them on an existing sandbox, recreate the sandbox so the new values bake into the image:

$ nemohermes onboard --resume --recreate-sandbox

Verify the Active Model

Use nemohermes inference get to print the provider and model the gateway is currently routing to. Run it before nemohermes inference set to confirm the starting state, or after a switch to verify the new route.

1 $ nemohermes inference get
2 Provider: nvidia-prod
3 Model:    nvidia/nemotron-3-super-120b-a12b

Pass --json for machine-readable output.

1 $ nemohermes inference get --json
2 {
3   "provider": "nvidia-prod",
4   "model": "nvidia/nemotron-3-super-120b-a12b"
5 }

The command exits non-zero with OpenShell inference route is not configured. when the gateway has no registered inference route. Run nemohermes onboard to configure one.

Run the status command when you also need sandbox, service, and messaging health:

$ nemohermes <name> status

The status output includes the active provider, model, and endpoint with the rest of the sandbox state.

Notes

The host keeps provider credentials.
The sandbox continues to use inference.local.
nemohermes inference set patches the selected running Hermes sandbox config and recomputes its config hash.
Use nemohermes onboard --resume --recreate-sandbox for build-time settings such as context window, max tokens, reasoning mode, heartbeat cadence, or image contents.
Local Ollama and local vLLM routes use local provider tokens rather than OPENAI_API_KEY. Rebuilds of older local-inference sandboxes clear the stale OpenAI credential requirement automatically.

Inference Options for the full list of providers available during onboarding.

Change the active inference model while the sandbox is running. You do not need to restart the sandbox.

Prerequisites

A running NemoClaw sandbox.
The OpenShell CLI on your PATH, which NemoClaw uses under the hood.

Switch to a Different Model

NVIDIA Endpoints

$ nemohermes inference set --provider nvidia-prod --model nvidia/nemotron-3-super-120b-a12b

OpenAI

$ nemohermes inference set --provider openai-api --model gpt-5.4

Anthropic

$ nemohermes inference set --provider anthropic-prod --model claude-sonnet-4-6

Google Gemini

$ nemohermes inference set --provider gemini-api --model gemini-2.5-flash

Compatible Endpoints

If you onboarded a custom compatible endpoint, switch models with the provider created for that endpoint:

$ nemohermes inference set --provider compatible-endpoint --model <model-name>

$ nemohermes inference set --provider compatible-anthropic-endpoint --model <model-name>

Hermes Provider

For a NemoClaw-managed Hermes sandbox, use the Hermes alias with the registered Hermes Provider route:

$ nemohermes inference set --provider hermes-provider --model openai/gpt-5.4-mini

Switching from Responses API to Chat Completions

$ nemohermes onboard

Select the same provider and endpoint again. The updated streaming probe detects incomplete /v1/responses support and selects /v1/chat/completions automatically.

$ NEMOCLAW_PREFERRED_API=openai-responses nemohermes onboard

Cross-Provider Switching

$ nemohermes inference set --provider anthropic-prod --model claude-sonnet-4-6 --no-verify

Use --no-verify only when OpenShell cannot verify the provider at switch time but you have already confirmed the provider and credential.

Tune Model Metadata

Variable	Values	Default
`NEMOCLAW_CONTEXT_WINDOW`	Positive integer (tokens)	`131072`
`NEMOCLAW_MAX_TOKENS`	Positive integer (tokens)	`4096`
`NEMOCLAW_REASONING`	`true` or `false`	`false`
`NEMOCLAW_INFERENCE_INPUTS`	`text` or `text,image`	`text`
`NEMOCLAW_AGENT_TIMEOUT`	Positive integer (seconds)	`600`
`NEMOCLAW_AGENT_HEARTBEAT_EVERY`	Go-style duration (`30m`, `1h`, `0m` to disable)	`unset` (OpenClaw default)

$ export NEMOCLAW_CONTEXT_WINDOW=65536
$ export NEMOCLAW_MAX_TOKENS=8192
$ export NEMOCLAW_REASONING=true
$ export NEMOCLAW_INFERENCE_INPUTS=text,image
$ export NEMOCLAW_AGENT_TIMEOUT=1800
$ export NEMOCLAW_AGENT_HEARTBEAT_EVERY=0m
$ nemohermes onboard

Hermes does not use OpenClaw’s HEARTBEAT.md wake-up mechanism. Rebuild the sandbox with nemohermes onboard --resume to apply build-time inference metadata changes.

These variables are build-time settings. If you change them on an existing sandbox, recreate the sandbox so the new values bake into the image:

$ nemohermes onboard --resume --recreate-sandbox

Verify the Active Model

1 $ nemohermes inference get
2 Provider: nvidia-prod
3 Model:    nvidia/nemotron-3-super-120b-a12b

Pass --json for machine-readable output.

1 $ nemohermes inference get --json
2 {
3   "provider": "nvidia-prod",
4   "model": "nvidia/nemotron-3-super-120b-a12b"
5 }

The command exits non-zero with OpenShell inference route is not configured. when the gateway has no registered inference route. Run nemohermes onboard to configure one.

Run the status command when you also need sandbox, service, and messaging health:

$ nemohermes <name> status

The status output includes the active provider, model, and endpoint with the rest of the sandbox state.

Notes

The host keeps provider credentials.
The sandbox continues to use inference.local.
nemohermes inference set patches the selected running Hermes sandbox config and recomputes its config hash.
Use nemohermes onboard --resume --recreate-sandbox for build-time settings such as context window, max tokens, reasoning mode, heartbeat cadence, or image contents.
Local Ollama and local vLLM routes use local provider tokens rather than OPENAI_API_KEY. Rebuilds of older local-inference sandboxes clear the stale OpenAI credential requirement automatically.

Inference Options for the full list of providers available during onboarding.

$	export NEMOCLAW_CONTEXT_WINDOW=65536
$	export NEMOCLAW_MAX_TOKENS=8192
$	export NEMOCLAW_REASONING=true
$	export NEMOCLAW_INFERENCE_INPUTS=text,image
$	export NEMOCLAW_AGENT_TIMEOUT=1800
$	export NEMOCLAW_AGENT_HEARTBEAT_EVERY=0m
$	nemohermes onboard

1	$ nemohermes inference get
2	Provider: nvidia-prod
3	Model: nvidia/nemotron-3-super-120b-a12b

1	$ nemohermes inference get --json
2	{
3	"provider": "nvidia-prod",
4	"model": "nvidia/nemotron-3-super-120b-a12b"
5	}

Prerequisites

Switch to a Different Model

NVIDIA Endpoints

OpenAI

Anthropic

Google Gemini

Compatible Endpoints

Hermes Provider

Switching from Responses API to Chat Completions

Cross-Provider Switching

Tune Model Metadata

Verify the Active Model

Notes

Related Topics

Prerequisites

Switch to a Different Model

NVIDIA Endpoints

OpenAI

Anthropic

Google Gemini

Compatible Endpoints

Hermes Provider

Switching from Responses API to Chat Completions

Cross-Provider Switching

Tune Model Metadata

Verify the Active Model

Notes

Related Topics