# Switch Inference Models at Runtime [\#](https://docs.nvidia.com/nemoclaw/latest/inference/switch-inference-providers.html\#switch-inference-models-at-runtime "Link to this heading")

Change the active inference model while the sandbox is running.
No restart is required.

## Prerequisites [\#](https://docs.nvidia.com/nemoclaw/latest/inference/switch-inference-providers.html\#prerequisites "Link to this heading")

- A running NemoClaw sandbox.

- The OpenShell CLI on your `PATH`.


## Switch to a Different Model [\#](https://docs.nvidia.com/nemoclaw/latest/inference/switch-inference-providers.html\#switch-to-a-different-model "Link to this heading")

Switching happens through the OpenShell inference route.
Use the provider and model that match the upstream you want to use.
This is one of the cases where a NemoClaw workflow intentionally uses `openshell`; see [CLI Selection Guide](https://docs.nvidia.com/nemoclaw/latest/reference/cli-selection-guide.html) for the general boundary.

### NVIDIA Endpoints [\#](https://docs.nvidia.com/nemoclaw/latest/inference/switch-inference-providers.html\#nvidia-endpoints "Link to this heading")

```
$ openshell inference set --provider nvidia-prod --model nvidia/nemotron-3-super-120b-a12b
```

Copy to clipboard

### OpenAI [\#](https://docs.nvidia.com/nemoclaw/latest/inference/switch-inference-providers.html\#openai "Link to this heading")

```
$ openshell inference set --provider openai-api --model gpt-5.4
```

Copy to clipboard

### Anthropic [\#](https://docs.nvidia.com/nemoclaw/latest/inference/switch-inference-providers.html\#anthropic "Link to this heading")

```
$ openshell inference set --provider anthropic-prod --model claude-sonnet-4-6
```

Copy to clipboard

### Google Gemini [\#](https://docs.nvidia.com/nemoclaw/latest/inference/switch-inference-providers.html\#google-gemini "Link to this heading")

```
$ openshell inference set --provider gemini-api --model gemini-2.5-flash
```

Copy to clipboard

### Compatible Endpoints [\#](https://docs.nvidia.com/nemoclaw/latest/inference/switch-inference-providers.html\#compatible-endpoints "Link to this heading")

If you onboarded a custom compatible endpoint, switch models with the provider created for that endpoint:

```
$ openshell inference set --provider compatible-endpoint --model <model-name>
```

Copy to clipboard

```
$ openshell inference set --provider compatible-anthropic-endpoint --model <model-name>
```

Copy to clipboard

If the provider itself needs to change, rerun `nemoclaw onboard`.

#### Switching from Responses API to Chat Completions [\#](https://docs.nvidia.com/nemoclaw/latest/inference/switch-inference-providers.html\#switching-from-responses-api-to-chat-completions "Link to this heading")

If onboarding selected `/v1/responses` but the agent fails at runtime (for
example, because the backend does not emit the streaming events OpenClaw
requires), re-run onboarding so the wizard re-probes the endpoint and bakes
the correct API path into the image:

```
$ nemoclaw onboard
```

Copy to clipboard

Select the same provider and endpoint again.
The updated streaming probe will detect incomplete `/v1/responses` support
and select `/v1/chat/completions` automatically.

For the compatible-endpoint provider, NemoClaw uses `/v1/chat/completions` by
default, so no env var is required to keep the safe path.
To opt in to `/v1/responses` for a backend you have verified end to end, set
`NEMOCLAW_PREFERRED_API` before onboarding:

```
$ NEMOCLAW_PREFERRED_API=openai-responses nemoclaw onboard
```

Copy to clipboard

Note

`NEMOCLAW_INFERENCE_API_OVERRIDE` patches the config at container startup but
does not update the Dockerfile ARG baked into the image.
If you recreate the sandbox without the override env var, the image reverts to
the original API path.
A fresh `nemoclaw onboard` is the reliable fix because it updates both the
session and the baked image.

## Cross-Provider Switching [\#](https://docs.nvidia.com/nemoclaw/latest/inference/switch-inference-providers.html\#cross-provider-switching "Link to this heading")

Switching to a different provider family (for example, from NVIDIA Endpoints to Anthropic) requires updating both the gateway route and the sandbox config.

Set the gateway route on the host:

```
$ openshell inference set --provider anthropic-prod --model claude-sonnet-4-6 --no-verify
```

Copy to clipboard

Then set the override env vars and recreate the sandbox so they take effect at startup:

```
$ export NEMOCLAW_MODEL_OVERRIDE="anthropic/claude-sonnet-4-6"
$ export NEMOCLAW_INFERENCE_API_OVERRIDE="anthropic-messages"
$ nemoclaw onboard --resume --recreate-sandbox
```

Copy to clipboard

The entrypoint patches `openclaw.json` at container startup with the override values.
You do not need to rebuild the image.
Remove the env vars and recreate the sandbox to revert to the original model.

`NEMOCLAW_INFERENCE_API_OVERRIDE` accepts `openai-completions` (for NVIDIA, OpenAI, Gemini, compatible endpoints) or `anthropic-messages` (for Anthropic and Anthropic-compatible endpoints).
This variable is only needed when switching between provider families.

## Tune Model Metadata [\#](https://docs.nvidia.com/nemoclaw/latest/inference/switch-inference-providers.html\#tune-model-metadata "Link to this heading")

The sandbox image bakes model metadata (context window, max output tokens, reasoning mode, and accepted input modalities) into `openclaw.json` at build time.
To change these values, set the corresponding environment variables before running `nemoclaw onboard` so they patch into the Dockerfile before the image builds.

| Variable | Values | Default |
| --- | --- | --- |
| `NEMOCLAW_CONTEXT_WINDOW` | Positive integer (tokens) | `131072` |
| `NEMOCLAW_MAX_TOKENS` | Positive integer (tokens) | `4096` |
| `NEMOCLAW_REASONING` | `true` or `false` | `false` |
| `NEMOCLAW_INFERENCE_INPUTS` | `text` or `text,image` | `text` |
| `NEMOCLAW_AGENT_TIMEOUT` | Positive integer (seconds) | `600` |
| `NEMOCLAW_AGENT_HEARTBEAT_EVERY` | Go-style duration (`30m`, `1h`, `0m` to disable) | `unset` (OpenClaw default) |

Invalid values are ignored, and the default bakes into the image.
Use `NEMOCLAW_INFERENCE_INPUTS=text,image` only for a model that accepts image input through the selected provider.

```
$ export NEMOCLAW_CONTEXT_WINDOW=65536
$ export NEMOCLAW_MAX_TOKENS=8192
$ export NEMOCLAW_REASONING=true
$ export NEMOCLAW_INFERENCE_INPUTS=text,image
$ export NEMOCLAW_AGENT_TIMEOUT=1800
$ export NEMOCLAW_AGENT_HEARTBEAT_EVERY=0m
$ nemoclaw onboard
```

Copy to clipboard

`NEMOCLAW_AGENT_TIMEOUT` controls the per-request inference timeout baked into
`agents.defaults.timeoutSeconds`. Increase it for slow local inference (for
example, CPU-only Ollama or vLLM on modest hardware). `openclaw.json` is
immutable at runtime, so this value can only be changed by rebuilding the
sandbox via `nemoclaw onboard`.

`NEMOCLAW_AGENT_HEARTBEAT_EVERY` sets `agents.defaults.heartbeat.every`.
This controls OpenClaw’s periodic main-session agent turn.
Each interval, the agent wakes up to review follow-ups and read `HEARTBEAT.md` if present in the workspace.
The OpenClaw default is 30 minutes (1 hour for Anthropic OAuth / Claude CLI reuse).
Tune the cadence with a duration string like `5m` or `2h`, or set `0m` to disable the periodic turns entirely.
Disabling also drops `HEARTBEAT.md` from normal-run bootstrap context per upstream behavior, so the model no longer sees heartbeat-only instructions.
`openclaw.json` is immutable at runtime, so the in-sandbox `openclaw config set` command cannot change this.
Rebuild the sandbox via `nemoclaw onboard --resume` to apply a new value.

These variables are build-time settings.
If you change them on an existing sandbox, recreate the sandbox so the new values bake into the image:

```
$ nemoclaw onboard --resume --recreate-sandbox
```

Copy to clipboard

## Verify the Active Model [\#](https://docs.nvidia.com/nemoclaw/latest/inference/switch-inference-providers.html\#verify-the-active-model "Link to this heading")

Run the status command to confirm the change:

```
$ nemoclaw <name> status
```

Copy to clipboard

Add the `--json` flag for machine-readable output:

```
$ nemoclaw <name> status --json
```

Copy to clipboard

The output includes the active provider, model, and endpoint.

## Notes [\#](https://docs.nvidia.com/nemoclaw/latest/inference/switch-inference-providers.html\#notes "Link to this heading")

- The host keeps provider credentials.

- The sandbox continues to use `inference.local`.

- Same-provider model switches take effect immediately via the gateway route alone.

- Cross-provider switches also require `NEMOCLAW_MODEL_OVERRIDE` (and `NEMOCLAW_INFERENCE_API_OVERRIDE`) plus a sandbox recreate so the entrypoint patches the config at startup.

- Overrides are applied at container startup. Changing or removing env vars requires a sandbox recreate to take effect.

- Local Ollama and local vLLM routes use local provider tokens rather than `OPENAI_API_KEY`. Rebuilds of older local-inference sandboxes clear the stale OpenAI credential requirement automatically.


## Related Topics [\#](https://docs.nvidia.com/nemoclaw/latest/inference/switch-inference-providers.html\#related-topics "Link to this heading")

- [Inference Options](https://docs.nvidia.com/nemoclaw/latest/inference/inference-options.html) for the full list of providers available during onboarding.