> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemoclaw/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemoclaw/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemoclaw/_mcp/server.

# Troubleshooting

> Diagnose and resolve common NemoClaw installation, onboarding, and runtime issues.

# Troubleshooting

This page covers common issues you may encounter when installing, onboarding, or running NemoClaw, along with their resolution steps.

If your issue is not listed here, join the [NemoClaw Discord channel](https://discord.gg/XFpfPv9Uvx) to ask questions and get help from the community. You can also [file an issue on GitHub](https://github.com/NVIDIA/NemoClaw/issues/new).

## Installation

### `nemoclaw` not found after install

If you use nvm or fnm to manage Node.js, the installer may not update your current shell's PATH.
The `nemoclaw` binary is installed but the shell session does not know where to find it.

Run `source ~/.bashrc` (or `source ~/.zshrc` for zsh), or open a new terminal window.

When installing from a source checkout with `npm install`, NemoClaw first tries `npm link`.
If the global npm prefix is not writable, it writes a managed shim to `~/.local/bin/nemoclaw` instead.
Add `~/.local/bin` to your `PATH` if the command is still not found.
Source-checkout installs also bootstrap OpenShell when it is missing before running preflight.
If a source install still reports that `openshell` is not available, re-run the installer from the repository root and check that `~/.local/bin` is on your `PATH`.

### Installer fails on unsupported platform

The installer checks for a supported OS and architecture before proceeding.
If you see an unsupported platform error, verify that you are running on a tested platform listed in the Container Runtimes table in the quickstart guide.

### Node.js version is too old

NemoClaw requires Node.js 22.16 or later.
If the installer exits with a Node.js version error, check your current version:

```bash
node --version
```

If the version is below 22.16, install a supported release.
If you use nvm, run:

```bash
nvm install 22
nvm use 22
```

Then re-run the installer.

### Image push fails with out-of-memory errors

The sandbox image is approximately 2.4 GB compressed. During image push, the Docker daemon, k3s, and the OpenShell gateway run alongside the export pipeline, which buffers decompressed layers in memory. On machines with less than 8 GB of RAM, this combined usage can trigger the OOM killer.

If you cannot add memory, configure at least 8 GB of swap to work around the issue at the cost of slower performance.

### Docker is not running

The installer and onboard wizard require Docker to be running.
If you see a Docker connection error, start the Docker daemon:

```bash
sudo systemctl start docker
```

On macOS with Docker Desktop, open the Docker Desktop application and wait for it to finish starting before retrying.

### Docker permission denied on Linux

On Linux, if the Docker daemon is running but you see "permission denied" errors, your user may not be in the `docker` group.
The installer can add your user to the group, but Linux does not activate that membership in the current shell automatically.
Add your user and activate the group in the current shell:

NemoClaw needs Docker access.
On personal Linux development machines, adding your user to the `docker` group is the standard way to run Docker without sudo.
Members of the `docker` group can control the daemon with root-level impact, so grant this access only to trusted local accounts; on shared or managed systems, use your organization's approved Docker access path.
For background, review Docker's [daemon attack surface guidance](https://docs.docker.com/engine/security/#docker-daemon-attack-surface).

```bash
sudo usermod -aG docker $USER
newgrp docker
```

Then retry `nemoclaw onboard`.
If the installer stopped after printing `newgrp docker`, run that command and then re-run the installer:

```bash
newgrp docker
curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
```

### macOS first-run failures

The two most common first-run failures on macOS are missing developer tools and Docker connection errors.

To avoid these issues, install the prerequisites in the following order before running the NemoClaw installer:

1. Install Xcode Command Line Tools (`xcode-select --install`). These are needed by the installer and Node.js toolchain.
2. Install and start a supported container runtime (Docker Desktop or Colima). Without a running runtime, the installer cannot connect to Docker.

### Permission errors during installation

The NemoClaw installer does not require `sudo` or root.
It installs Node.js via nvm and NemoClaw via npm, both into user-local directories.
The installer also handles OpenShell installation automatically using a pinned release.

If you see permission errors during installation, they typically come from Docker, not the NemoClaw installer itself.
Docker must be installed and running before you run the installer, and installing Docker may require elevated privileges on Linux.

### npm install fails with permission errors

If `npm install` fails with an `EACCES` permission error, do not run npm with `sudo`.
Instead, configure npm to use a directory you own:

```bash
mkdir -p ~/.npm-global
npm config set prefix ~/.npm-global
export PATH=~/.npm-global/bin:$PATH
```

Add the `export` line to your `~/.bashrc` or `~/.zshrc` to make it permanent, then re-run the installer.

### Installer fails on NVIDIA Jetson

The installer auto-detects NVIDIA Jetson devices (Orin and Thor) and applies required host configuration before the normal install flow.
If the Jetson setup step fails, verify that you have `sudo` access and that Docker is installed and running.

For JetPack 6 (L4T 36.x), the setup switches iptables to legacy mode and adjusts the Docker daemon configuration.
For JetPack 7 (L4T 38.x / Thor), only bridge netfilter and sysctl settings are applied.
For JetPack 7 (L4T 39.x), bridge netfilter is loaded only when the host is missing it. Some R39 images already ship with `br_netfilter` configured and are left untouched. On affected R39 hosts, the installer prints `loading br_netfilter (required by k3s inside the OpenShell gateway)`. Without this fix, sandbox pods fail DNS resolution against the in-cluster service and the onboard `Setting up OpenClaw inside sandbox` step times out.

If the L4T version is not recognized, the setup step is skipped and the installer continues normally.

### DNS resolution from inside docker fails (corporate firewall)

Some corporate networks block outbound UDP port 53 to public DNS servers and force all host name resolution through DNS over TLS on TCP port 853. Containers do not inherit the host's DNS-over-TLS configuration, so the sandbox build's `npm ci` step times out trying to resolve `registry.npmjs.org` against `1.1.1.1` or `8.8.8.8`.

NemoClaw's preflight runs a short `docker run --rm busybox nslookup nemoclaw-dns-probe-<random>.invalid` probe before starting the sandbox build. The fresh `.invalid` name should return NXDOMAIN through a working resolver, so cached answers cannot hide blocked DNS egress. When the probe confirms a DNS failure, onboarding stops with platform-specific remediation instead of hanging for \~15 minutes and printing a cryptic `Exit handler never called`.

The fix depends on your platform and runtime. Pick the matching path from the preflight output, apply it, then re-run `nemoclaw onboard`.

* **Linux with systemd-resolved.** Add a `DNSStubListenerExtra` drop-in pointing at the docker bridge gateway IP (the preflight prints the detected IP), then add the same IP to `/etc/docker/daemon.json` under `dns`. Restart `systemd-resolved` and `docker`.
* **macOS with Colima.** Restart Colima with the corporate DNS address, for example `colima stop && colima start --dns <corp-dns-ip>`.
* **macOS with Docker Desktop.** Add the corporate DNS address to `~/.docker/daemon.json` under `dns`, then restart Docker Desktop.
* **Windows or WSL.** Configure DNS in the Docker Desktop settings GUI, or apply the Linux fix above when running native docker inside WSL.

Verify the fix worked:

```bash
docker run --rm busybox nslookup example.com
```

When the lookup returns an answer, retry onboarding.

### Port already in use

The NemoClaw dashboard uses port `18789` by default and the gateway uses port `8080`.
If another sandbox already owns the dashboard port, onboarding scans ports `18789` through `18799` and uses the next free port.
If all ports in that range are occupied, the error lists the owner for each port and suggests using `--control-ui-port` with a port outside the range.

On macOS, the port check also tries a privileged `lsof` probe without prompting for a password so root-owned listeners are detected before the sandbox build starts.
If a port becomes occupied after preflight but before `openshell forward start` runs, onboarding deletes the just-created sandbox and exits with a retry message instead of leaving a sandbox with an unreachable dashboard URL.

When a previous onboard, upgrade, or sandbox crash leaves a stale `openclaw-gateway` host process holding the dashboard port, `nemoclaw onboard --fresh`, `nemoclaw <name> destroy` (when destroying the last sandbox), and `nemoclaw uninstall` automatically sweep the dashboard port range and signal `SIGTERM` then `SIGKILL` to recover.
The sweep only targets processes owned by the current user whose command line matches `openclaw-gateway` or `openshell forward` markers, and skips dashboard ports owned by other live sandboxes.

If a non-NemoClaw process is already bound to the dashboard port or the gateway port, identify the conflicting process, verify it is safe to stop, and terminate it:

```bash
sudo lsof -i :18789
kill <PID>
```

If the process does not exit, use `kill -9 <PID>` to force-terminate it.
Then retry onboarding.

Alternatively, override the conflicting port instead of stopping the other process.
Pass `--control-ui-port` with the desired dashboard port:

```bash
nemoclaw onboard --control-ui-port 19000
```

You can also set `CHAT_UI_URL` with the desired port:

```bash
CHAT_UI_URL=http://127.0.0.1:19000 nemoclaw onboard
```

Or set the port directly:

```bash
NEMOCLAW_DASHBOARD_PORT=19000 nemoclaw onboard
```

For an OpenShell gateway port conflict, set `NEMOCLAW_GATEWAY_PORT` to a free
non-privileged port that does not overlap NemoClaw's dashboard, vLLM, Ollama,
or Ollama proxy ports:

```bash
NEMOCLAW_GATEWAY_PORT=8990 nemoclaw onboard
```

Remote/headless hosts can bind the OpenShell gateway to all IPv4 interfaces:

```bash
NEMOCLAW_GATEWAY_BIND_ADDRESS=0.0.0.0 NEMOCLAW_GATEWAY_PORT=8990 nemoclaw onboard
```

Use `NEMOCLAW_GATEWAY_BIND_ADDRESS=0.0.0.0` only when other hosts on the
network should be able to reach the gateway.

See [Environment Variables](commands#environment-variables) for the full list of port overrides.

### Running multiple sandboxes simultaneously

Each sandbox requires its own dashboard port.
If you onboard a second sandbox without overriding the port, onboarding uses the next free port in the `18789` to `18799` range.
`onboard` checks `openshell forward list` before starting a new forward, so a second onboard cannot silently take over the first sandbox's port.

Assign a distinct port only when you want a specific value:

```bash
nemoclaw onboard                                                   # first sandbox uses default 18789
nemoclaw onboard                                                   # second sandbox uses the next free port
nemoclaw onboard --control-ui-port 19000                          # explicit port override
```

Each sandbox then has its own SSH tunnel and its own dashboard URL:

```text
http://localhost:18789   ← first sandbox
http://localhost:19000   ← second sandbox
```

You can verify which tunnel belongs to which sandbox with:

```bash
openshell forward list
nemoclaw list
```

`nemoclaw list` prints the recorded dashboard URL for each sandbox.

## Onboarding

### Cgroup v2 errors during onboard

Older NemoClaw releases relied on a Docker cgroup workaround on Ubuntu 24.04, DGX Spark, and WSL2.
Current OpenShell releases handle that behavior themselves, so NemoClaw no longer requires a Spark-specific setup step.

If onboarding reports that Docker is missing or unreachable, fix Docker first and retry onboarding:

```bash
nemoclaw onboard
```

Podman is not a tested runtime.
If onboarding or sandbox lifecycle fails, switch to a tested runtime (Docker Desktop, Colima, or Docker Engine) and rerun onboarding.

### Cluster fails with `overlayfs snapshotter cannot be enabled` on Docker 26+

Docker Engine 26 and later default fresh installations to the [containerd image store](https://docs.docker.com/engine/storage/containerd/), which exposes its layers via the `overlayfs` snapshotter rather than the legacy `overlay2` graph driver.
The k3s server inside the OpenShell cluster image needs to mount its own overlay filesystem on top, and the kernel rejects nesting two non-trivial overlay mounts.
The cluster container then loops with:

```text
"overlayfs" snapshotter cannot be enabled for "/var/lib/rancher/k3s/agent/containerd",
try using "fuse-overlayfs" or "native":
failed to mount overlay: ... err: invalid argument
```

This is a Docker default-driver change, not a NemoClaw or OpenShell regression.
The same hardware running Docker 25 or earlier — or any Docker version with the containerd image store disabled — uses the legacy `overlay2` driver and is unaffected.

NemoClaw detects the Docker 26+ containerd-snapshotter overlayfs configuration during onboarding and transparently builds a small drop-in replacement for the cluster image on the local Docker engine.
The patched image installs `fuse-overlayfs` and selects it as the k3s snapshotter, bypassing the kernel-level nested-overlay limitation.
No host configuration changes, sudo, or Docker restart required.

The auto-fix runs once per OpenShell version on the affected host.
Subsequent onboarding runs reuse the cached patched image.
Hosts without the conflict (`Driver: overlay2` in `docker info`, macOS Docker Desktop, or Linux installations that disable the containerd image store) see no change in behavior.

Override knobs:

* `NEMOCLAW_DISABLE_OVERLAY_FIX=1` — skip the auto-fix and run against the unmodified upstream cluster image.
  Useful for diagnosis or when you have already applied the manual workaround below.
* `NEMOCLAW_OVERLAY_SNAPSHOTTER=native` — build the patched image with k3s's `native` snapshotter instead of `fuse-overlayfs`.
  The `native` snapshotter copies image layers instead of overlaying them, so it uses more disk but does not depend on FUSE.
  Default is `fuse-overlayfs`.

If you prefer to disable the new Docker storage driver instead of running the patched image, edit `/etc/docker/daemon.json`:

```json
{
  "storage-driver": "overlay2",
  "features": { "containerd-snapshotter": false }
}
```

Then restart Docker (`sudo systemctl restart docker`) and re-run `nemoclaw onboard`.
This restores the legacy `overlay2` driver host-wide, which kills any other running containers — prefer the auto-fix unless you need the change for unrelated reasons.
Switching storage drivers also rebuilds the entire local image graph: previously-pulled images become unusable and Docker re-pulls them on first reference, so expect a cold cache and additional disk usage right after the restart.

### OpenShell version above maximum

Each NemoClaw release validates against a range of tested OpenShell versions.
If the installed OpenShell version exceeds the configured maximum, `nemoclaw onboard` exits with an error:

```text
✗ openshell <version> is above the maximum supported by this NemoClaw release.
  blueprint.yaml max_openshell_version: <max>
```

Upgrade NemoClaw to a version that supports your OpenShell release, or install a supported OpenShell version from the [OpenShell releases page](https://github.com/NVIDIA/OpenShell/releases).

For fresh installs, NemoClaw passes the blueprint range to `install-openshell.sh` and resolves a compatible published OpenShell release before downloading.
If GitHub release metadata is unavailable, the script uses its bundled fallback pin and the post-install gate still enforces the configured range.

### Sandbox containers cannot reach the gateway

On native Linux Docker-driver hosts, `nemoclaw onboard` verifies the route that sandbox containers use to reach the OpenShell gateway.
If a host firewall blocks that path, onboarding exits with output like:

```text
✗ Sandbox containers cannot reach the gateway at host.openshell.internal:8080.
  A host firewall may be blocking traffic from the OpenShell Docker bridge.
```

Apply the `ufw` command printed by onboarding, then rerun onboarding.
If the message does not include a subnet, derive it from the OpenShell Docker network:

```bash
SUBNET=$(docker network inspect openshell-docker --format '{{(index .IPAM.Config 0).Subnet}}')
sudo ufw allow from "$SUBNET" to any port 8080 proto tcp
nemoclaw onboard
```

### `connect` exits because the gateway is down

`nemoclaw <name> connect` checks the OpenShell gateway before it tries dashboard forwarding, SSH, or inference repair.
If the gateway is not reachable, the command exits early and prints recovery guidance.

Start the gateway or resume onboarding, then retry:

```bash
openshell gateway start --name nemoclaw
nemoclaw onboard --resume
nemoclaw <name> connect
```

Run `nemoclaw status` for a broader gateway health report.

### Invalid sandbox name

Sandbox names must be lowercase, start with a letter, contain only letters, numbers, and internal hyphens, and end with a letter or number.
The CLI rejects names that do not match these rules.
It prints a `Try: <suggested-slug>` recovery line whenever it can derive a valid lowercase, hyphen-separated form from the input, so passing `--name MyAssistant` reports `Try: myassistant` and you can rerun with the suggested slug.

Names that collide with global CLI commands are also rejected.
Reserved names include `onboard`, `list`, `deploy`, `setup`, `start`, `stop`, `status`, `debug`, `uninstall`, `credentials`, and `help`.
Using a reserved name would cause the CLI to route to the global command instead of the sandbox.

If the name does not match these rules or is reserved, the wizard exits with an error.
Choose a name such as `my-assistant` or `dev1`.

### Sandbox creation fails on DGX

On DGX machines, sandbox creation can fail if the gateway's DNS has not finished propagating or if a stale port forward from a previous onboard run is still active.

Run `nemoclaw onboard` to retry.
The wizard cleans up stale port forwards and waits for gateway readiness automatically.

### GPU setup fails with a placeholder GPU name

On Windows or WSL hosts, some systems report a placeholder display adapter name even when no NVIDIA GPU firmware is present.
NVIDIA NIM and GPU-backed sandbox setup require a real NVIDIA GPU.
If NemoClaw rejects the detected GPU name during preflight, select a CPU or remote inference provider, or move the setup to a host with a supported NVIDIA GPU and current drivers.

Jetson hosts can still run NemoClaw, but sandbox GPU passthrough is not supported there.
If onboarding reports that sandbox GPU passthrough is unavailable on Jetson, rerun onboarding without `--sandbox-gpu`.

### Colima socket not detected (macOS)

Newer Colima versions use the XDG base directory (`~/.config/colima/default/docker.sock`) instead of the legacy path (`~/.colima/default/docker.sock`).
Some installations expose a top-level Colima socket at `~/.colima/docker.sock`.
NemoClaw checks all three paths.
If neither is found, verify that Colima is running:

```bash
colima status
```

### Sandbox build is slow or hangs (under-provisioned container runtime)

Default Colima ships with 2 vCPU and 2 GiB of memory, which is not enough headroom for the BuildKit-driven sandbox image build.
On macOS Apple Silicon, the build can stall part-way through with no progress and no error, leaving the wizard waiting indefinitely.

Preflight inspects `docker info` for `NCPU` and `MemTotal` and prints a warning when the runtime falls below 4 vCPU or 8 GiB.
In interactive onboarding, the warning prompt defaults to abort, so pressing Enter stops the run before the sandbox build reaches the likely stall point.
Type `y` only when you intentionally want to continue on the smaller runtime.
Non-interactive onboarding prints the warning and continues.
On Colima, raise the resources before re-running onboard:

```bash
colima stop
colima start --cpu 6 --memory 12 --disk 100
```

On Docker Desktop, raise CPU and memory limits in *Settings → Resources*, then apply and restart.

To silence the warning when the host is intentionally small, set `NEMOCLAW_IGNORE_RUNTIME_RESOURCES=1` before running `nemoclaw onboard`.

### Re-onboard fails because port 18789 is held by SSH

After destroying a sandbox and gateway, the SSH port-forward process for the
dashboard can be left running.
Re-running onboard then fails preflight with `Port 18789 is not available.
Blocked by: ssh`.

Current NemoClaw detects this case and kills the orphaned SSH process
automatically before retrying the port check.
If you see the error on an older release, identify the SSH process and
terminate it manually:

```bash
sudo lsof -i :18789
kill <PID>
```

Then re-run `nemoclaw onboard`.

### Updated messaging token is not picked up

Re-running `nemoclaw onboard --non-interactive` with a new
`TELEGRAM_BOT_TOKEN`, `DISCORD_BOT_TOKEN`, or `SLACK_BOT_TOKEN` previously
reported success while the sandbox kept polling with the old credential.
Current NemoClaw stores SHA-256 hashes of messaging credentials in the
sandbox registry at creation time and detects when a token has changed.
When rotation is detected, NemoClaw automatically backs up workspace state,
deletes the sandbox, recreates it with the new credential, and restores the
backup.

If you suspect a sandbox is still using a stale token, re-run onboarding so
the credential check runs:

```bash
nemoclaw onboard --non-interactive
```

### Sandbox creation killed by OOM (exit 137)

On systems with 8 GB RAM or less and no swap configured, the sandbox image push can exhaust available memory and get killed by the Linux OOM killer (exit code 137).

NemoClaw automatically detects low memory during onboarding and prompts to create a 4 GB swap file.
If this automatic step fails or you are using a custom setup flow, create swap manually before running `nemoclaw onboard`:

```bash
sudo dd if=/dev/zero of=/swapfile bs=1M count=4096 status=none
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
nemoclaw onboard
```

### Previous onboarding session failed

If a previous `nemoclaw onboard` attempt fails partway through (for example, a provider or inference-setup step reporting an error), NemoClaw records the failure in `~/.nemoclaw/onboard-session.json`.

When you re-run the installer, it detects the failed session and does not silently retry it.
Silent retry would loop on the same failure if your original choice, such as an unreachable provider, was the cause.

* In an interactive terminal, the installer prompts whether to resume the failed session or start fresh.
  Press `R` (or Enter) to retry the same session, or `f` to discard it and make fresh choices.
* In non-interactive mode (piped `curl | bash` with `NEMOCLAW_NON_INTERACTIVE=1`, CI, scripts), the installer refuses and exits with a non-zero status so a scripted re-run cannot loop.
  You must opt in to one of two paths explicitly:

  Start over with new choices (discards the recorded session and provider/model selection):

  ```bash
  curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash -s -- --fresh
  ```

  Or equivalently, via env var.
  The variable must be set on the `bash` side of the pipe, not on `curl`, since only the right-hand process inherits it:

  ```bash
  curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_FRESH=1 bash
  ```

  Retry the same session without re-prompting.
  This is only useful if the original failure was transient, for example a network blip or a stopped Docker daemon, and not a wrong provider choice:

  ```bash
  nemoclaw onboard --resume
  ```

As a last resort, you can also delete the session file directly and re-run the installer:

```bash
rm ~/.nemoclaw/onboard-session.json
```

## Runtime

### Reconnect after a host reboot

After a host reboot, the container runtime, OpenShell gateway, and sandbox may not be running.
Follow these steps to reconnect.

1. Start the container runtime.

   * **Linux:** start Docker if it is not already running (`sudo systemctl start docker`)
   * **macOS:** open Docker Desktop or start Colima (`colima start`)

2. Check sandbox state.

   ```bash
   openshell sandbox list
   ```

   If the sandbox shows `Ready`, skip to step 4.

3. Restart the gateway (if needed).

   If the sandbox is not listed or the command fails, restart the OpenShell gateway:

   ```bash
   openshell gateway start --name nemoclaw
   ```

   Wait a few seconds, then re-check with `openshell sandbox list`.

4. Reconnect.

   ```bash
   nemoclaw <name> connect
   ```

   The gateway usually rotates its SSH host keys across a reboot.
   `connect` detects the resulting identity drift, prunes the stale `openshell-*` entries from `~/.ssh/known_hosts`, and retries automatically.
   You do not need to edit `known_hosts` by hand or re-run `nemoclaw onboard` in this case.

5. Start host auxiliary services (if needed).

   If you use the cloudflared tunnel started by `nemoclaw tunnel start`, start it again:

   ```bash
   nemoclaw tunnel start
   ```

   OpenShell-managed channel messaging handles Telegram, Discord, Slack, WeChat, and WhatsApp at onboarding, not through a separate bridge process from `nemoclaw tunnel start`.
   WeChat and WhatsApp are experimental.
   To pause a single bridge without destroying the sandbox, use `nemoclaw <name> channels stop <channel>`.

If the sandbox remains missing after restarting the gateway, run `nemoclaw onboard` to recreate it.
The wizard prompts for confirmation before destroying an existing sandbox. If you confirm, it **destroys and recreates** the sandbox. Workspace files (SOUL.md, USER.md, IDENTITY.md, AGENTS.md, MEMORY.md, and daily memory notes) are lost.
Back up your workspace first by following the instructions at [Back Up and Restore](../manage-sandboxes/backup-restore).

### Sandbox is running an outdated agent version

After upgrading NemoClaw, `nemoclaw <name> connect` and `nemoclaw <name> status` warn if the sandbox is running an older agent version than the current image.

To upgrade the sandbox while preserving workspace state, run:

```bash
nemoclaw <name> rebuild
```

The rebuild command backs up state, destroys the old sandbox, recreates it with the current image, and restores state.
Create a snapshot before rebuilding if you want an additional safety net:

```bash
nemoclaw <name> snapshot create
nemoclaw <name> rebuild
```

### Sandbox shows as stopped

The sandbox may have been stopped or deleted.
Run `nemoclaw onboard` to recreate the sandbox from the same blueprint and policy definitions.

### Status shows "not running" inside the sandbox

This is expected behavior.
When checking status inside an active sandbox, host-side sandbox state and inference configuration are not inspectable.
The status command detects the sandbox context and reports "active (inside sandbox)" instead.

Run `openshell sandbox list` on the host to check the underlying sandbox state.

### Git clone fails with a certificate verification error

In networks that inspect TLS, OpenShell injects a proxy CA bundle into the sandbox.
Current NemoClaw exports that bundle as `GIT_SSL_CAINFO` during sandbox startup and persists it for `nemoclaw <name> connect` sessions, so Git can trust the proxy CA.
It also forwards standard CA bundle variables for subprocesses, including `GIT_SSL_CAPATH`, `CURL_CA_BUNDLE`, and `REQUESTS_CA_BUNDLE`.

If Git still reports `server certificate verification failed`, reconnect to the sandbox and check that the CA variables are present:

```bash
env | grep -E 'SSL_CERT_FILE|GIT_SSL_CAINFO|CURL_CA_BUNDLE|REQUESTS_CA_BUNDLE'
```

If they are missing on an older sandbox, upgrade NemoClaw and run:

```bash
nemoclaw <name> rebuild
```

### Sandbox creation reports a TLS certificate mismatch

If sandbox creation reports a TLS or certificate mismatch, the OpenShell gateway certificate may have changed since the CLI last trusted it.
Refresh the gateway trust and then resume onboarding:

```bash
openshell gateway trust -g nemoclaw
nemoclaw onboard --resume
```

### `openclaw update` hangs or times out inside the sandbox

This is expected for the current NemoClaw deployment model.
NemoClaw installs `openclaw` into the sandbox image at build time, so the CLI is image-pinned rather than updated in place inside a running sandbox.

Do not run `openclaw update` inside the sandbox.
Instead:

1. Upgrade to a NemoClaw release that includes the newer `openclaw` version.
2. If you build NemoClaw from source, bump the pinned `openclaw` version in `Dockerfile.base` and rebuild the sandbox base image.
3. Run `nemoclaw <name> rebuild` to recreate the sandbox with the updated image. The rebuild command automatically backs up workspace state before destroying the old sandbox and restores it afterward.

### Inference requests time out

Verify that the inference provider endpoint is reachable from the host.
Check the active provider and endpoint:

```bash
nemoclaw <name> status
```

For local Ollama and local vLLM, `nemoclaw <name> status` also prints an `Inference` line that probes the host-side health endpoint directly.
If that line shows `unreachable`, start the local backend first and then retry the request.
For Local Ollama, current releases also print `Inference (auth proxy)` when a proxy token is available.
If the backend is healthy but the auth proxy is `unauthorized` or `unreachable`, re-run onboarding so NemoClaw can recreate the proxy token, restart the proxy, and refresh the route.

If the endpoint is correct but requests still fail, check for network policy rules that may block the connection.
Then verify the credential and base URL for the provider you selected during onboarding.

If you entered an AWS Bedrock Runtime URL such as `https://bedrock-runtime.us-east-1.amazonaws.com` in the **Other Anthropic-compatible endpoint** flow, NemoClaw auto-detects it and routes sandbox traffic through a host-local adapter.
Use the raw Bedrock Runtime host, not an Anthropic `/v1/messages` path, and verify that the model ID or inference profile ID is valid for that region.
For auth, export `AWS_BEARER_TOKEN_BEDROCK`, `AWS_PROFILE`, or standard IAM environment credentials before onboarding; if you paste a key at the `COMPATIBLE_ANTHROPIC_API_KEY` prompt, NemoClaw uses it only as the adapter's Bedrock bearer token.
Region errors usually mean the pasted endpoint region, `AWS_REGION`, `AWS_DEFAULT_REGION`, or the model/inference profile ID do not match.

For Ollama, vLLM, NIM, and compatible-endpoint setup, the default timeout is 180 seconds.
If large prompts still cause timeouts, increase it with `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` before re-running onboard:

```bash
export NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300
nemoclaw onboard
```

For local Ollama and vLLM, onboarding retries the container reachability check and can fall back to the host-side health check when the local backend is healthy.
If all attempts fail, the error includes container reachability diagnostics such as HTTP status and host gateway resolution.

`NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` only covers the inference-server validation probe.
The post-create readiness wait has its own budget (`NEMOCLAW_SANDBOX_READY_TIMEOUT`); see [Sandbox onboard times out with "did not become ready within Ns"](#sandbox-onboard-times-out-with-did-not-become-ready-within-ns) for the readiness path.

### Sandbox onboard times out with "did not become ready within Ns"

Onboarding ends with:

```text
  Sandbox 'my-assistant' was created but did not become ready within 180s.
  The orphaned sandbox has been removed — you can safely retry.
```

This is a separate budget from `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` — it covers the readiness wait that follows sandbox creation (in-sandbox boot, OpenClaw start, policy load), not the inference probe.
The 180-second default fits typical workstations but can be exceeded when:

* The host is building or uploading the sandbox image for the first time (cold caches, slow link).
* The selected model is large (70B+ parameters or 4-bit/8-bit quantisations that take time to memory-map).
* Onboarding runs on a remote VM where image upload to the gateway streams over the network (for example DGX Station first-run installer).

Raise the budget before re-running onboard:

```bash
export NEMOCLAW_SANDBOX_READY_TIMEOUT=600
nemoclaw onboard
```

The variable accepts seconds and applies to the readiness wait only.
When the deadline expires, NemoClaw deletes the partially-created sandbox before printing the retry hint, so the next `nemoclaw onboard` starts from a clean state.
If readiness still fails after the extended budget, inspect the gateway and sandbox status:

```bash
openshell sandbox list
nemoclaw <name> status
```

### Agent fails at runtime after onboarding succeeds with a compatible endpoint

Some OpenAI-compatible servers (such as SGLang) expose `/v1/responses` but their
streaming mode is incomplete.
OpenClaw requires granular streaming events like `response.output_text.delta`
that these backends do not emit.

For the compatible-endpoint provider, NemoClaw now defaults to
`/v1/chat/completions` and skips the Responses API probe entirely unless you
opt in.
If you onboarded an older release that selected `/v1/responses`, re-run
onboarding so the wizard rebuilds the image with chat completions:

```bash
nemoclaw onboard
```

If you previously set `NEMOCLAW_PREFERRED_API=openai-responses` to force the
Responses API, unset it before re-running onboard.

When you enable Telegram messaging with an OpenAI-compatible endpoint, onboarding also checks `inference.local` from inside the sandbox.
If that smoke check fails, fix the compatible-endpoint base URL, credentials, model, or network route before testing the Telegram bot again.

Do not rely on `NEMOCLAW_INFERENCE_API_OVERRIDE` alone — it patches the config
at container startup but does not update the Dockerfile ARG baked into the
image.
A fresh `nemoclaw onboard` is the reliable fix.

### `NEMOCLAW_DISABLE_DEVICE_AUTH=1` does not change an existing sandbox

This is expected behavior.
`NEMOCLAW_DISABLE_DEVICE_AUTH` is a build-time setting used when NemoClaw creates the sandbox image.
Changing or exporting it later does not rewrite the baked `openclaw.json` inside an existing sandbox.

If you need a different device-auth setting, rerun onboarding so NemoClaw rebuilds the sandbox image with the desired configuration.
For the security trade-offs, refer to [Security Best Practices](../security/best-practices).

### `openclaw.json` is empty after changing inference

Some runtime inference changes can leave `/sandbox/.openclaw/openclaw.json` empty if the write fails partway through.
When that happens, OpenClaw commands may report that the config is empty instead of showing a raw JSON parse error.

Current NemoClaw sandboxes capture a known-good config baseline after a successful startup.
On the next sandbox startup, NemoClaw restores `openclaw.json` from OpenClaw's last-good copy when available, or from the NemoClaw baseline.
If the sandbox still cannot start or reports that no baseline is available, rebuild it from the host:

```bash
nemoclaw <name> rebuild
```

### `openclaw channels add` or `remove` is blocked inside the sandbox

This is expected.
The messaging channel list is frozen into the sandbox's container image when the image is built during `nemoclaw onboard` or `nemoclaw rebuild` (the selected channel names are passed to the `docker build` as `NEMOCLAW_MESSAGING_CHANNELS_B64` and written into agent config such as `/sandbox/.openclaw/openclaw.json` for OpenClaw or `/sandbox/.hermes/.env` for Hermes).
Changes made inside the running sandbox do not persist across rebuilds, so `openclaw channels` commands that mutate the config are intercepted.
NemoClaw's sandbox entrypoint installs a guard that intercepts `openclaw channels <add|remove>` and prints an actionable error pointing at the host-side commands below, instead of letting the call fail deep in the binary with a raw `EACCES` trace.

Run the equivalent host-side command instead:

```bash
nemoclaw <sandbox> channels list
nemoclaw <sandbox> channels add <telegram|discord|slack|wechat|whatsapp>
nemoclaw <sandbox> channels remove <telegram|discord|slack|wechat|whatsapp>
```

`channels add` registers credentials with the OpenShell gateway and `channels remove` clears them.
Both offer to rebuild the sandbox so the image reflects the new channel set.
In non-interactive mode (`NEMOCLAW_NON_INTERACTIVE=1`), the commands stage the change and leave the rebuild to a follow-up `nemoclaw <sandbox> rebuild`.
WeChat and WhatsApp are experimental.
Review [Messaging Channels](../manage-sandboxes/messaging-channels) before enabling them.

WeChat captures its bot token through a host-side QR scan during `nemoclaw onboard` or `channels add wechat`.
You scan the iLink QR from WeChat on your phone and NemoClaw registers the captured token with the OpenShell gateway.

WhatsApp pairs entirely inside the sandbox.
NemoClaw advertises WhatsApp for OpenClaw and Hermes sandboxes after you add the channel on the host.
Run `openclaw channels login --channel whatsapp` inside OpenClaw sandboxes, or run `hermes whatsapp` inside Hermes sandboxes.

### `scripts/rcf_patch.py` is missing from the blueprint

`scripts/rcf_patch.py` is intentionally absent from current NemoClaw blueprints.
Older QA plans used that helper for a Dockerfile "Patch-4" test that corrupted the build-time `replaceConfigFile` monkey-patch and expected `ERROR: Patch 4 (replaceConfigFile EACCES) not applied`.
The old Patch-4 fail-closed test no longer applies because NemoClaw no longer patches OpenClaw's compiled `replaceConfigFile` source at image build time.

Current sandboxes use a mutable-default config model instead.
Before a reviewed host-side lockdown, `/sandbox/.openclaw/openclaw.json` is group-writable by the sandbox and gateway users, so OpenClaw config mutations should write normally rather than requiring an EACCES swallow.
After lockdown, runtime config mutations should fail cleanly or route users to the supported host-side NemoClaw command.

To validate this area now, use the config lifecycle tests instead of looking for `rcf_patch.py`:

```bash
npm run build:cli
npm test -- test/repro-2681-group-writable.test.ts
```

### `openclaw config set` or `unset` is blocked inside the sandbox

This is expected.
NemoClaw builds the sandbox's OpenClaw configuration (`/sandbox/.openclaw/openclaw.json`) from host-side onboarding, rebuild, inference, policy, and messaging inputs.
Fresh sandboxes keep that file writable by default so the agent can manage runtime state, but direct in-sandbox edits are not the supported or durable path for NemoClaw-managed settings.
NemoClaw's sandbox entrypoint installs a guard that intercepts `openclaw config set` and `openclaw config unset` and prints an actionable error, because changes made inside the running sandbox do not persist across rebuilds.

For most configuration changes, exit the sandbox and rerun onboarding:

```bash
nemoclaw onboard
```

If NemoClaw reports a resumable failed onboarding session, run `nemoclaw onboard --resume` instead.
This rebuilds the sandbox with your updated settings.

For advanced live edits, use the host-side config command instead of running `openclaw config set` inside the sandbox:

```bash
nemoclaw <sandbox> config set --key <dotpath> --value '<json-or-string>' --restart
```

Host-side `config set` validates any HTTP or HTTPS URLs in the new value, including URLs nested inside JSON objects or arrays. NemoClaw rejects loopback, private, reserved, and internal hosts; DNS names must resolve successfully and must not resolve to private/internal addresses. HTTP URLs are written with the validated IP address pinned to reduce DNS-rebinding risk. Avoid putting credentials in config values; rotate provider credentials with the credential-management commands instead.

### `openclaw doctor --fix` cannot repair Discord channel config inside the sandbox

This is expected in NemoClaw-managed sandboxes.
NemoClaw bakes channel entries into `/sandbox/.openclaw/openclaw.json` at image build time.

As a result, commands that try to rewrite the baked config from inside the sandbox, including `openclaw doctor --fix`, cannot repair Discord, Telegram, or Slack channel entries in place.

If your Discord channel config is wrong, rerun onboarding so NemoClaw rebuilds the sandbox image with the correct messaging selection.
Do not treat a failed `doctor --fix` run as proof that the Discord gateway path itself is broken.

If `openclaw doctor` reports that it moved Telegram single-account values under `channels.telegram.accounts.default`, rerun onboarding and rebuild the sandbox rather than trying to patch `openclaw.json` in place.
Current NemoClaw rebuilds bake Telegram in the account-based layout and set Telegram group chats to `groupPolicy: open`, which avoids the empty `groupAllowFrom` warning path for default group-chat access.

### Discord bot logs in, but the channel still does not work

Separate the problem into two parts:

1. Baked config and provider wiring

   Check that onboarding selected Discord and that the sandbox was created with the Discord messaging provider attached.
   If Discord was skipped during onboarding, rerun onboarding and select Discord again.

2. Native Discord gateway path

   Successful login alone does not prove that Discord works end to end.
   Discord also needs a working gateway connection to `gateway.discord.gg`.
   If logs show errors such as `getaddrinfo EAI_AGAIN gateway.discord.gg`, repeated reconnect loops, or a `400` response while probing the gateway path, the problem is usually in the native gateway/proxy path rather than in the baked config.

Common signs of a native gateway-path failure:

* REST calls to `discord.com` succeed, but the Discord channel never becomes healthy
* `gateway.discord.gg` fails with DNS resolution errors
* the WebSocket path returns `400` instead of opening a tunnel
* native command deployment fails even though the bot token itself is valid

In that case:

* keep the Discord policy preset applied
* verify the sandbox was created with the Discord provider attached
* inspect gateway logs and blocked requests with `openshell term`
* treat the failure as a native Discord gateway problem, not as a bridge startup problem

### Discord preset validation behind a proxy

The built-in Discord policy preset intentionally allows the Node binaries used by the messaging runtime and does not allow `curl`.
As a result, `curl -s https://discord.com` failing, hanging, or printing no output is not proof that the Discord preset is broken.

Behind the OpenShell proxy, direct DNS-only checks can also be the wrong signal.
For example, `dns.resolve("gateway.discord.gg")` can fail even when HTTPS requests routed through the proxy are healthy.

Use Node HTTPS as the manual REST probe:

```bash
node - <<'NODE'
const https = require("node:https");

https
  .get("https://discord.com/api/v10/gateway", (res) => {
    console.log(`${res.statusCode} ${res.statusMessage || ""}`.trim());
    res.resume();
  })
  .on("error", (err) => {
    console.error(err.message);
    process.exitCode = 1;
  });
NODE
```

To check Discord CDN egress, use the same Node HTTPS path:

```bash
node - <<'NODE'
const https = require("node:https");

https
  .get("https://cdn.discordapp.com/", (res) => {
    console.log(`${res.statusCode} ${res.statusMessage || ""}`.trim());
    res.resume();
  })
  .on("error", (err) => {
    console.error(err.message);
    process.exitCode = 1;
  });
NODE
```

Any HTTP status from these probes means the Node process reached the endpoint; the exact status can vary by unauthenticated path.
If the Node REST probe works but the Discord channel is still unhealthy, investigate the native gateway path instead of widening the preset.
Check the gateway logs and blocked-request output with `openshell term`, and look for `gateway.discord.gg` connection or WebSocket upgrade failures.

### Messaging bridge appears running but no messages arrive

Bot tokens for Telegram (`getUpdates`), Discord (gateway), and Slack (Socket Mode) only allow one active consumer per token. If two NemoClaw sandboxes are configured with the same bot token, each one kicks the other off its polling connection and neither delivers messages. `nemoclaw status` still reports the bridge as running because the gateway process itself is alive.

For Telegram group chats, first check BotFather privacy mode.
New Telegram bots default to privacy mode enabled, which prevents group messages from reaching `getUpdates` even when the user mentions the bot.
In @BotFather, run `/setprivacy`, choose the bot, and choose **Disable**.
Then remove the bot from the affected group and add it back; Telegram applies the privacy-mode change to group delivery only after the bot rejoins.

For Telegram direct messages, make sure the rebuilt sandbox has a DM allowlist.
Set `TELEGRAM_ALLOWED_IDS` before rebuild; `TELEGRAM_AUTHORIZED_CHAT_IDS` and `TELEGRAM_CHAT_ID` are accepted as compatibility aliases.
Keep the aliases until QA automation and public repro templates have stopped exporting them for at least one full release.
Bot API `sendMessage` sends from the bot to a chat, so it only proves outbound Telegram API access.
To prove inbound agent routing, send a message from the Telegram client as an allowed user and then watch the gateway log for the agent turn and outbound reply.
For a reproducible live check that also exercises an alias, run `test/e2e/test-messaging-providers.sh` with `TELEGRAM_BOT_TOKEN_REAL`, either `TELEGRAM_AUTHORIZED_CHAT_IDS` or `TELEGRAM_CHAT_ID`, and `NEMOCLAW_TELEGRAM_INBOUND_REPLY_E2E=1`; when prompted, send a fresh direct message from that Telegram client.
The check waits for `[telegram] [default] inbound update received` and `[telegram] [default] outbound sendMessage attempted` in `/tmp/gateway.log`.

To diagnose, open a shell in the sandbox and inspect the gateway log:

```bash
nemoclaw <sandbox-name> connect
tail -f /tmp/gateway.log
```

A repeating line like the following confirms the conflict:

```text
[telegram] getUpdates conflict: 409: Conflict: terminated by other getUpdates request; retrying in 30s.
```

To fix, run `nemoclaw <other-sandbox> destroy` on whichever sandbox should stop polling, or rerun onboarding on it with the channel disabled. Current NemoClaw warns at `nemoclaw onboard` time when another sandbox already has the same channel enabled, but sandboxes created before that check was added may still be in a conflict loop.

### Landlock filesystem restrictions silently degraded

After sandbox creation, NemoClaw checks whether the host kernel supports Landlock (Linux 5.13+).
If the kernel is too old or you are running on macOS (where the Docker VM kernel may lack Landlock), a warning prints:

```text
⚠ Landlock: Docker VM kernel <version> does not support Landlock (requires ≥5.13).
  Sandbox filesystem restrictions will silently degrade (best_effort mode).
```

This warning is informational and does not block sandbox creation.
The sandbox runs without kernel-level filesystem restrictions, relying on container mount configuration instead.
For full filesystem enforcement, run on a Linux kernel 5.13 or later (Ubuntu 22.04 LTS and later include Landlock support).

### Sandbox lost after gateway restart

Sandboxes created with OpenShell versions older than 0.0.24 can become unreachable after a gateway restart because SSH secrets were not persisted.
Running `nemoclaw onboard` automatically upgrades OpenShell to 0.0.24 or later during the preflight check.
After the upgrade, recreate the sandbox with `nemoclaw onboard`.

### Agent cannot reach external hosts through a proxy

NemoClaw uses a default proxy address of `10.200.0.1:3128` (the OpenShell-injected gateway).
If your environment uses a different proxy, set `NEMOCLAW_PROXY_HOST` and `NEMOCLAW_PROXY_PORT` before onboarding:

```bash
export NEMOCLAW_PROXY_HOST=proxy.example.com
export NEMOCLAW_PROXY_PORT=8080
nemoclaw onboard
```

These are build-time settings baked into the sandbox image.
Changing them after onboarding requires re-running `nemoclaw onboard` to rebuild the image.

When `HTTP_PROXY` or `HTTPS_PROXY` is set on the host, NemoClaw adds `localhost` and `127.0.0.1` to `NO_PROXY` for managed subprocesses.
This keeps local Ollama health checks and model pulls from being routed through a corporate or desktop proxy while preserving the proxy for external hosts.

### Agent cannot reach a host-side HTTP service

When a sandbox needs to call an HTTP service running on the host, use the normal OpenShell network policy path.
Expose the service on a host IP address that the OpenShell gateway can reach, create a custom NemoClaw policy preset for that IP and port, and apply it with `nemoclaw <sandbox> policy-add --from-file`.
The sandbox request then flows through the OpenShell proxy while NemoClaw preserves the existing live policy entries.

Do not rely on `host.docker.internal` or `host.openshell.internal` as a general-purpose host-service path.
Those names may appear in the sandbox's `/etc/hosts`, but in OpenShell's sandbox network they are not guaranteed to point at a reachable host gateway.
Bypassing the proxy with `--noproxy '*'` also bypasses network policy enforcement and audit.

First, make sure the host-side service listens on a non-loopback address.
For example, a health endpoint on port `50001` should be reachable from the host IP, not only from `127.0.0.1`:

```console
$ curl -s http://10.0.0.5:50001/health
{"status":"ok"}
```

Then create a custom NemoClaw preset for the host-side service.
Replace `10.0.0.5`, `50001`, paths, methods, and binaries with the service you want the sandbox to reach:

```yaml
preset:
  name: host-memory-api
  description: "Host memory API"
network_policies:
  host_memory_api:
    name: host_memory_api
    endpoints:
      - host: 10.0.0.5
        port: 50001
        protocol: rest
        enforcement: enforce
        rules:
          - allow: { method: GET, path: "/health" }
    binaries:
      - { path: /usr/bin/curl }
```

Apply the preset to the running sandbox with the NemoClaw CLI:

```bash
nemoclaw my-assistant policy-add --from-file ./host-memory-api.yaml
```

After you apply the policy, retry the request from inside the sandbox without disabling the proxy:

```console
$ curl -s http://10.0.0.5:50001/health
{"status":"ok"}
```

If the request is still denied, check the blocked request in `openshell term`.
The policy `binaries` list must include the executable path that actually made the request.
If the response changes from `policy_denied` to `upstream_unreachable`, the policy matched, but the OpenShell gateway could not reach the host IP and port.

### Agent cannot reach an external host

OpenShell blocks outbound connections to hosts not listed in the network policy.
Open the TUI to see blocked requests and approve them:

```bash
openshell term
```

To permanently allow an endpoint, add it to the network policy.
Refer to [Customize the Network Policy](../network-policy/customize-network-policy) for details.

### Dashboard not reachable after setting a custom port

If you ran `nemoclaw onboard` with a custom dashboard port and onboarding completed
but the dashboard URL is unreachable (browser shows connection refused or the page fails
to load), the sandbox was most likely created with an older NemoClaw version that did not
pass the dashboard port into the sandbox at startup. The gateway inside the sandbox
continued listening on the default port 18789 while the SSH tunnel forwarded the custom
port — leaving nothing at the other end of the tunnel.

Re-run onboarding on the current NemoClaw release with the desired port. Current versions
derive the dashboard port from `CHAT_UI_URL` automatically and inject it into the sandbox:

```bash
CHAT_UI_URL=http://127.0.0.1:19000 nemoclaw onboard
```

If you need to run multiple sandboxes at different ports at the same time, see
[Running multiple sandboxes simultaneously](#running-multiple-sandboxes-simultaneously).

### Ollama auth proxy did not start

NemoClaw keeps Ollama bound to `127.0.0.1:11434` and starts a token-gated
reverse proxy on `0.0.0.0:11435` so the sandbox can reach Ollama without
exposing it to the local network.
If the proxy fails to start, onboarding exits before configuring inference.

Check whether the proxy port is occupied by another process:

```bash
sudo lsof -i :11435
```

Stop the conflicting process and re-run `nemoclaw onboard`.
The wizard cleans up stale proxy processes from previous runs automatically,
so most failures resolve by retrying.

The proxy token is persisted to `~/.nemoclaw/ollama-proxy-token` with `0600`
permissions.
If the file is missing or unreadable after a host reboot, re-running
`nemoclaw onboard` regenerates it.

### Ollama auth proxy is unreachable from the sandbox

On native Linux Docker-driver hosts, a host firewall can allow the host proxy check but block sandbox traffic to the Ollama auth proxy.
When that happens, onboarding exits before it saves the inference route and prints output like:

```text
✗ Sandbox containers cannot reach the Ollama auth proxy at host.openshell.internal:11435.
  A host firewall may be blocking traffic from the OpenShell Docker bridge.
```

Apply the `ufw` command printed by onboarding, then rerun onboarding.
If the message does not include a subnet, derive it from the OpenShell Docker network:

```bash
SUBNET=$(docker network inspect openshell-docker --format '{{(index .IPAM.Config 0).Subnet}}')
sudo ufw allow from "$SUBNET" to any port 11435 proto tcp
nemoclaw onboard
```

Docker Desktop, WSL, and hosts without the OpenShell Docker network use different routing models.
In those cases NemoClaw treats an unavailable sandbox-side probe as non-blocking and relies on the regular proxy health check.

### `host.docker.internal` does not reliably reach the host from the sandbox

Configuring an inference provider with a base URL like `http://host.docker.internal:11434/v1` does not reliably reach a host Ollama service from inside the OpenShell sandbox.
OpenShell runs sandboxes inside a k3s network, where `host.docker.internal` is not a portable host-service route.
Depending on the platform, it may fail DNS resolution or resolve to an internal gateway/bridge address where the host's port `11434` is not forwarded.
The sandbox then sees a DNS failure or `connection refused`:

```console
$ getent hosts host.docker.internal
172.17.0.1      host.docker.internal host.openshell.internal
$ no_proxy=host.docker.internal curl -v http://host.docker.internal:11434/api/tags
* connect to 172.17.0.1 port 11434 failed: Connection refused
```

For local Ollama, use the auth-proxy URL that NemoClaw's "Local Ollama" onboard
option configures automatically:

```text
http://host.openshell.internal:11435/v1
```

`host.openshell.internal` resolves to the same gateway IP, and the
[token-gated Ollama auth proxy](#ollama-auth-proxy-did-not-start) binds port
`11435` there and forwards requests to `127.0.0.1:11434` on the host.
If you need a different host service exposed to the sandbox, route it through
the OpenShell gateway rather than relying on `host.docker.internal`.
See issue [#3136](https://github.com/NVIDIA/NemoClaw/issues/3136).

### Local inference health check resolves to IPv6

Local inference health checks now use `127.0.0.1` instead of `localhost`.
On systems where `localhost` resolves to `::1` first, older NemoClaw releases
could probe the wrong address and report the local backend as unreachable
even when it was running.
If you see this on a current NemoClaw release, verify that the local backend
binds an IPv4 address and not only `::1`.

### Blueprint run failed

View the error output for the failed blueprint run:

```bash
nemoclaw <name> logs
```

Use `--follow` to stream logs in real time while debugging.

<a id="dgx-spark" />

## DGX Spark

For an end-to-end Ollama walkthrough on DGX Spark, refer to the [NVIDIA Spark playbook](https://build.nvidia.com/spark/nemoclaw).

### CoreDNS CrashLoop after onboarding

If CoreDNS in the embedded k3s cluster crashes shortly after setup, it is usually because it resolves against `127.0.0.11`, which does not route inside the gateway container.
Run `fix-coredns.sh` to point CoreDNS at the container gateway IP instead, then recreate the sandbox.

### `k3s` cannot find a freshly built image

After building a new sandbox image, `k3s` inside the gateway container sometimes fails to pull it even though the image exists on the host.
Remove the gateway registration, stop any leftover host gateway process, then re-run setup.

```bash
openshell gateway remove nemoclaw
sudo pkill -f openshell-gateway
nemoclaw onboard --resume
```

### GPU passthrough on Spark

GPU passthrough is not CI-tested on DGX Spark.
It is expected to work when you pass `--gpu` and the NVIDIA Container Toolkit is configured.
Verify the toolkit is configured by running `docker run --rm --runtime=nvidia --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi` from the host.
If `nvidia-smi` works on the host but onboarding says GPU passthrough was not enabled, install or repair the NVIDIA Container Toolkit, then run `sudo nvidia-ctk runtime configure --runtime=docker && sudo systemctl restart docker`.
If a reusable gateway was previously started without GPU passthrough, NemoClaw replaces it automatically only when no other registered sandboxes depend on it, or when `--recreate-sandbox` is recreating the only registered sandbox with the same name.
When shared gateway cleanup would be unsafe, follow the targeted destroy or gateway-removal commands printed by onboarding.

### `unresolvable CDI devices nvidia.com/gpu=all` during gateway start

Recent NVIDIA Container Toolkit installs configure the Docker daemon for Container Device Interface (CDI) device injection, which OpenShell's `gateway start --gpu` then auto-selects.
If no `nvidia.com/gpu` CDI spec has been generated on the host yet, gateway start fails with `Docker responded with status code 500: CDI device injection failed: unresolvable CDI devices nvidia.com/gpu=all`.
The standard NemoClaw installer detects this gap before onboarding, first tries to enable the NVIDIA CDI refresh systemd units, and falls back to generating the spec directly with `nvidia-ctk`.
If you run `nemoclaw onboard` directly, preflight prints the manual remediation instead.
The native Linux fix is the same on Docker hosts whose `docker info` advertises a non-empty `CDISpecDirs`.
On WSL with Docker Desktop, Docker may advertise CDI directories even though `--device nvidia.com/gpu=all` is not usable from the WSL distro.
For that runtime, NemoClaw skips Linux CDI repair and uses Docker's `--gpus` compatibility path for sandbox GPU access.
This compatibility path can be retired once Docker Desktop exposes usable `nvidia.com/gpu` CDI specs inside WSL, or once OpenShell no longer requires host-visible CDI specs for Docker Desktop WSL GPU passthrough.

Enable the refresh units, verify they list `nvidia.com/gpu` entries, then rerun onboarding:

```bash
sudo systemctl enable --now nvidia-cdi-refresh.path nvidia-cdi-refresh.service
nvidia-ctk cdi list
nemoclaw onboard
```

If the refresh units are unavailable or do not generate CDI devices, generate the spec directly:

```bash
sudo mkdir -p /etc/cdi
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
nvidia-ctk cdi list
```

On WSL with Docker Desktop, confirm Docker Desktop WSL integration is enabled for your distro and verify Docker GPU access from WSL:

```bash
docker run --rm --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
```

If GPU passthrough is not required on this host, rerun onboarding with `--no-gpu` instead.

### Docker GPU patch failed during sandbox create

On Linux Docker-driver gateways, NemoClaw may create the sandbox first and then recreate the OpenShell-managed Docker container with NVIDIA GPU flags.
If that compatibility patch fails, onboarding leaves the failed sandbox and diagnostic bundle in place so you can inspect the OpenShell and Docker state.

Starting with NemoClaw v0.0.43, the standard installer handles the `/proc/<pid>/task/<tid>/comm` permission case during this patch path.
If an older release fails direct GPU proof with that path and `Permission denied`, upgrade NemoClaw and rerun onboarding.

The output includes a cleanup command such as:

```bash
openshell sandbox delete <sandbox-name>
```

Fix the NVIDIA Container Toolkit or CDI configuration reported in the diagnostics, clean up the failed sandbox, then rerun onboarding.
If you do not need GPU access inside the sandbox, rerun with `--no-sandbox-gpu`.
Set `NEMOCLAW_DOCKER_GPU_PATCH=0` only when you need to bypass this compatibility path during troubleshooting.

### `pip install` fails with a system-packages error

Recent Ubuntu releases (including DGX Spark's Ubuntu 24.04) mark the system Python install as externally managed, so `pip install` without a virtual environment fails.
Use a venv instead.
Avoid `--break-system-packages` unless you understand the risk, since it can break host tooling.

```bash
python3 -m venv ~/.venvs/nemoclaw
source ~/.venvs/nemoclaw/bin/activate
pip install ...
```

### Port 3000 conflict with AI Workbench

NVIDIA AI Workbench's Traefik proxy binds ports 3000 and 10000.
If you run other services on Spark that expect port 3000, bind them to a different port.

<a id="windows-wsl-2" />

## Windows Subsystem for Linux

For environment setup steps, see [Windows Prerequisites](../get-started/windows-preparation).

### `wsl --install --no-distribution` returns Forbidden (403)

Check your network connectivity.
If you are behind a VPN, try reconnecting or switching to a different network.

### `wsl -d Ubuntu` says "There is no distribution with the supplied name"

The Ubuntu package was installed with `--no-launch` but never registered.
Run `ubuntu.exe install --root` from PowerShell to register it, or reinstall without `--no-launch`:

```bash
wsl --unregister Ubuntu
wsl --install -d Ubuntu
```

### `docker info` fails inside WSL

Confirm that Docker Desktop is running and that WSL integration is enabled for Ubuntu (Settings > Resources > WSL integration).
Then restart WSL:

```bash
wsl --shutdown
wsl -d Ubuntu
docker info
```

### Windows-host Ollama is installed but not shown during onboarding

When NemoClaw runs inside WSL, it checks both the Windows-host Ollama HTTP endpoint and the Windows `ollama.exe` process.
If Ollama is installed but the daemon is not reachable through `host.docker.internal:11434`, the wizard should still offer a start or restart action.

If the Windows-host option does not appear, confirm that PowerShell interop is enabled in WSL and that Windows can locate Ollama:

```bash
powershell.exe -NoProfile -Command "Get-Process ollama -ErrorAction SilentlyContinue"
```

If the process is missing, start Ollama from Windows and rerun onboarding.
If the process exists but the endpoint is unreachable, use the restart action when the wizard offers it, or restart Ollama from Windows with `OLLAMA_HOST=0.0.0.0:11434`.

### Ollama inference fails or hangs in WSL

Ollama configures context length based on your hardware.
On some GPUs (for example RTX 3500), the default context length is not sufficient for OpenClaw.
Force a larger context length:

```bash
pkill -f 'ollama serve'
OLLAMA_CONTEXT_LENGTH=16384 ollama serve
```

Verify that Ollama inference works:

```bash
echo "Hello" | ollama run <model-id>
```

Replace `<model-id>` with the model you selected during onboarding (for example `qwen3.5:4b`).

If `ollama serve` fails with `Error: listen tcp 127.0.0.1:11434: bind: address already in use`, check whether Ollama is configured for automatic startup:

```bash
sudo systemctl status ollama
```

If it is active, stop it first, then start with the custom context length:

```bash
sudo systemctl stop ollama
OLLAMA_CONTEXT_LENGTH=16384 ollama serve
```

For additional troubleshooting, see the [Quickstart](../get-started/quickstart) and [Windows Setup](../get-started/windows-preparation) pages.

## Podman

Podman is not a tested runtime.
OpenShell officially documents Docker-based runtimes only.
If you encounter issues with Podman, switch to a tested runtime (Docker Engine, Docker Desktop, or Colima) and rerun onboarding.

## Brev

For Brev setup instructions, refer to [Brev Web UI](../deployment/brev-web-ui).

### Most OpenClaw skills show as blocked

After deploying NemoClaw on Brev, the Skills page in the OpenClaw gateway dashboard shows most bundled skills with a `blocked` status.
Only three skills are available by default: `healthcheck`, `skill-creator`, and `weather`.

Skills are blocked for one of three reasons.

* The skill requires a macOS-only binary (`memo`, `remindctl`, `grizzly`, and similar) that is not available on the Linux (GCP) instance Brev provisions.
* The skill requires a CLI binary that is not pre-installed in the sandbox image, such as `gh` for the GitHub skill.
* The skill requires API credentials that have not been configured, such as a Notion API key or Discord bot token.

Skills that require macOS-only binaries cannot be enabled on Brev.
Skills that require additional CLI binaries require a custom sandbox image rebuild.

For credentials, use the supported host-side setup flow.
Re-run onboarding for inference or Brave Search credentials, or use `nemoclaw <name> channels add <telegram|discord|slack|wechat|whatsapp>` for messaging channels.
To add a binary to the sandbox image, update the sandbox `Dockerfile.base` to install the required package, then rebuild:

```bash
nemoclaw <name> rebuild
```

After the rebuild completes, return to the Skills page to confirm the skill status has changed from `blocked` to `ready`.

### `openclaw config set` fails with a permission error on Brev

When the sandbox config has been locked from the host, `openclaw.json` is owned by root and mounted read-only inside the sandbox.
Running `openclaw config set` inside the sandbox then returns:

```text
EACCES: permission denied, open '/sandbox/.openclaw/openclaw.json'
```

In the default sandbox state, `openclaw.json` is writable by the sandbox user.
If you see this error, use the host-side config command instead:

```bash
nemoclaw <name> config set --key <dotpath> --value '<json-or-string>' --restart
```

Refer to [Commands](commands) for the full list of supported configuration keys.

### OpenClaw dashboard is unreachable after extended uptime on Brev

After leaving NemoClaw running for an extended period on Brev, the OpenClaw dashboard may return `ERR_CONNECTION_RESET` or fail to load in the browser.
The agent may still respond on messaging channels such as Telegram or Slack while the dashboard is unreachable.

Take a snapshot before running onboard to protect your workspace files.

```bash
nemoclaw <name> snapshot create
```

Re-run onboarding to restore dashboard connectivity:

```bash
nemoclaw onboard
```

Depending on current sandbox state, onboarding may prompt before recreating resources.

### Skill install buttons do not work on Brev

Clicking **Install** on a skill in the OpenClaw gateway dashboard on Brev shows no response or fails silently.

Skill installation runs against the sandbox environment.
Installing packages on the Brev host does not make them available inside the sandbox.
To install a skill dependency, add it to the sandbox image and rebuild:

```bash
nemoclaw <name> rebuild
```

After the rebuild completes, return to the Skills page to confirm the skill is ready.

## Hermes

The issues below are common problems you may encounter when running Hermes through `nemohermes`.
For setup, refer to [Quickstart with Hermes](/get-started/quickstart-hermes).

### Port 8642 in a browser shows a blank page or `Cannot GET /`

`nemohermes onboard` forwards port `8642`, but Hermes serves an OpenAI-compatible API at that port, not a chat dashboard.
A browser visit to `http://127.0.0.1:8642/` (or any non-API path) returns nothing renderable.

Confirm the agent is healthy with the API health endpoint instead:

```console
$ curl -sf http://127.0.0.1:8642/health
{"status":"ok","platform":"hermes-agent"}
```

Point an OpenAI-compatible client at `http://127.0.0.1:8642/v1` for chat completions.
For terminal use, run `nemohermes <name> connect` and then `hermes` inside the sandbox.

### `nemohermes` reports `Sandbox 'X' already exists as OpenClaw`

Each sandbox name maps to exactly one agent type.
If a sandbox named `X` was created with the default OpenClaw agent, a later `nemohermes onboard` for the same name exits with:

```text
Sandbox 'X' already exists as OpenClaw.
nemohermes is onboarding Hermes for this sandbox name.
Side-by-side agents are supported, but each sandbox name has one agent type.
```

Pick a distinct sandbox name (the Hermes default is `hermes`; a common pattern is `my-hermes`) so Hermes and OpenClaw sandboxes can coexist on the same host.
To convert an existing sandbox to Hermes instead, destroy and re-onboard:

```console
$ nemoclaw <name> destroy
$ NEMOCLAW_AGENT=hermes nemohermes onboard
```

### `nemohermes: command not found` immediately after install

`nemohermes` is a thin shim installed alongside `nemoclaw` that pre-selects the Hermes agent.
The installer drops the shim in the same directory as `nemoclaw`; if `nemoclaw` is on `PATH` but `nemohermes` is not, the shim symlink was skipped.

Verify the install:

```console
$ command -v nemoclaw
$ command -v nemohermes
```

If only `nemoclaw` resolves, re-run the installer with `NEMOCLAW_AGENT=hermes` set so the shim is published:

```console
$ export NEMOCLAW_AGENT=hermes
$ curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
```

Equivalently, every `nemohermes <cmd>` invocation is `NEMOCLAW_AGENT=hermes nemoclaw <cmd>`.

### Choosing between OAuth and API key for the Hermes Provider

The Hermes Provider supports two authentication paths during onboarding.
Pick OAuth when you have a Nous Portal account and an interactive terminal; pick API key when you have a long-lived `NOUS_API_KEY` and want a non-interactive flow.

Set the method explicitly so the wizard skips the prompt:

```console
$ # OAuth (default; interactive)
$ export NEMOCLAW_HERMES_AUTH_METHOD=oauth
$ nemohermes onboard

$ # API key (non-interactive)
$ export NEMOCLAW_HERMES_AUTH_METHOD=api-key
$ export NOUS_API_KEY=nous_...
$ nemohermes onboard --non-interactive
```

`NEMOCLAW_HERMES_AUTH_METHOD` accepts `oauth`, `nous-portal-oauth`, `api-key`, and `nous-api-key`.
The `NEMOCLAW_HERMES_AUTH` and `NEMOCLAW_NOUS_AUTH_METHOD` variables are back-compatible aliases.

If OAuth is selected and onboarding cannot open the host's default browser (a headless host or SSH session), the device-code prompt still prints the verification URL and user code to the terminal.
Copy them to a browser on any other machine to complete the flow.

### API client returns `401 Unauthorized` against port 8642

Hermes uses bearer-token header authentication for client requests, not an OpenClaw-style URL fragment.
A request without an `Authorization: Bearer <token>` header (or with an OpenClaw `#token=` fragment appended to the URL) is rejected with `401`.

Configure your OpenAI-compatible client to pass the Hermes API key in the `Authorization` header.
Stored credentials (including `NOUS_API_KEY` and `OPENAI_API_KEY`) are listed by:

```console
$ nemohermes credentials list
```

Reset a specific provider's credentials with `nemohermes credentials reset <provider>` and re-onboard if the stored value is wrong.

### `Brave Search` policy preset has no effect under Hermes

The Hermes wizard intentionally omits the Brave Search preset because Hermes does not use NemoClaw's OpenClaw web-search configuration (see [Quickstart with Hermes](/get-started/quickstart-hermes) and [Network Policies](/reference/network-policies)).
If you add the `brave` preset to a Hermes sandbox after onboarding, the L7 egress allowlist opens for Brave's endpoints but the agent itself does not start consuming the credential.
Configure Hermes web search from the agent's own configuration inside the sandbox.

### Re-onboarding asks every messaging prompt again

`nemohermes onboard --resume` against a Hermes sandbox that was originally onboarded with Telegram, Discord, and Slack credentials re-prompts for each channel's bot token and per-channel settings rather than reusing the stored values.
This is tracked in [#3581](https://github.com/NVIDIA/NemoClaw/issues/3581).
For unattended re-onboards, export the messaging env vars first so the wizard skips the prompts:

```console
$ export TELEGRAM_BOT_TOKEN=...
$ export DISCORD_BOT_TOKEN=...
$ export SLACK_BOT_TOKEN=...
$ nemohermes onboard --resume --non-interactive
```