Troubleshooting#
Recovery steps for common NemoClaw / OpenClaw / VSS orchestrator issues. Commands assume the default sandbox name demo from Setup VSS and Skills; substitute your sandbox name wherever <sandbox> appears.
Cursor or IDE Kernel Uses Node.js 20#
Symptom: Section 3 fails during the NemoClaw installer with a Node.js version error.
Cause: NemoClaw’s installer needs Node.js >= 22.16. Cursor and some other IDEs can put their bundled Node.js 20 ahead of nvm on the Jupyter kernel’s PATH.
Solution: Fix the Node.js version manually in a host terminal, then restart the Jupyter kernel and re-run the failed Section 3 cell:
# Install nvm + Node 22 (skip the curl line if nvm is already installed)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
export NVM_DIR="$HOME/.nvm"; . "$NVM_DIR/nvm.sh"
nvm install 22 && nvm alias default 22
# Prepend Node 22 ahead of Cursor's bundled Node 20, then launch Cursor from THIS shell
export PATH="$NVM_DIR/versions/node/$(nvm version 22)/bin:$PATH"
node -v # must print v22.x
Launching Cursor from a terminal where node -v reports v22.x makes the kernel inherit the correct PATH. Alternatively, leave Cursor as-is and run this NemoClaw step from a plain terminal instead of the IDE kernel.
NemoClaw Forward Restart Fails#
Symptom: Section 3 of deploy_nemoclaw_vss.ipynb gets stuck or fails after installing the VSS OpenClaw plugin and restarting the gateway. The log reports that the gateway is healthy, then fails while refreshing the dashboard port-forward on 18789 with FailedPrecondition and sandbox is not ready.
Section 3 forward restart failure after the OpenClaw gateway reports healthy.#
Solution: From a terminal on the same instance, manually start the scoped OpenShell forward once:
openshell forward start --background 18789 <sandbox>
This can unblock the forward restart step and allow the step to complete.
OpenClaw UI Not Reachable#
Symptom: The OpenClaw UI link from Section 6.3 fails to load, hangs, or times out.
Cause: The OpenShell dashboard forward on 18789 stopped, was never created, or points at the wrong sandbox.
Solution:
Check the local health endpoint:
curl -fsS http://127.0.0.1:18789/health
List active forwards:
openshell forward list
Look for
<sandbox>on port18789with statusrunning.Restart the forward:
openshell forward stop 18789 <sandbox> openshell forward start --background 18789 <sandbox>
If you are not on the Brev instance directly, keep an SSH tunnel open from your laptop:
ssh -L 18789:127.0.0.1:18789 <user>@<nemoclaw-host>
MCP Tools Not Visible or Not Reachable#
Symptom: The agent cannot list vss_orchestrator__* tools, tool calls time out, or Section 6.3 says the MCP server is not reachable.
Solution:
Re-run notebook Section 6.1. It stops any previously recorded
VSS_ORCHESTRATOR_MCP_PID, starts the MCP server on9988, and polls health.Check the MCP log:
tail -n 100 ~/video-search-and-summarization/.orchestrator-artifacts/vss_orchestrator_mcp.log
From
services/agent/, call the health tool directly:uv run nat mcp client tool call vss_orchestrator__profiles \ --url http://127.0.0.1:9988/mcp \ --transport streamable-http
If the server fails before health, verify
deploy/docker/scripts/vss_orchestrator_mcp_config.ymlpaths are valid and writable, especiallymdx_data_dirandoutput_dir.
OpenClaw Model or Provider Misconfigured#
Symptom: Chat fails, what model are you using? reports the wrong model, or Section 3 errors before running init_nemoclaw.sh.
Solution:
If the UI opens but chat fails, re-check the provider setup and active policy.
If you reproduce the failure from a terminal, use
nemoclaw <sandbox> exec -- <command>rather thandocker execso the command runs through the documented NemoClaw path with workspace plugins and gateway routing.For build.nvidia.com, clear
NEMOCLAW_ENDPOINT_URLand setNVIDIA_API_KEY.NEMOCLAW_MODELcan be blank to usenvidia/nemotron-3-super-120b-a12b.For a custom provider, set all three values:
NEMOCLAW_ENDPOINT_URL,NEMOCLAW_MODEL, andCOMPATIBLE_API_KEY.For a local provider, bind the server to
0.0.0.0or another non-loopback host address. From inside the sandbox, usehost.openshell.internalrather than host127.0.0.1.Re-run the selected Section 1.2 provider cell, Section 1.3, and Section 3.
If the sandbox still keeps stale provider state, destroy and recreate it:
nemoclaw <sandbox> destroy
nemoclaw gc
Then re-run Section 3.
VSS Skills or Plugin Missing#
Symptom: list your available skills does not show VSS skills, the Skills tab is empty, or Section 4 reports no non-bundled skills.
Solution:
Run Section 4 to inspect
openclaw plugins list,openclaw plugins doctor, andopenclaw skills list --json.Check skills from the host:
openshell sandbox exec -n <sandbox> -- sh -lc 'openclaw skills list --json'
Confirm the VSS checkout includes
.openclaw/package.jsonandskills/.Re-run Section 3. The init script repacks
.openclawwith the repositoryskills/directory and reinstalls the plugin into the sandbox.
NGC or Model Artifact Downloads Fail#
Symptom: docker_up fails during ensure_model_artifacts, NGC CLI commands fail, or docker login nvcr.io fails.
Solution:
Verify
NGC_CLI_API_KEYis set in Section 1.1.Re-run Section 3 so the sandbox-side
ngccredential provider and NGC CLI install are refreshed.Re-run Section 5.1 and 5.2 on the host to configure
~/.ngc/configanddocker login nvcr.io.Check that
mdx_data_dirindeploy/docker/scripts/vss_orchestrator_mcp_config.ymlis writable. Model artifacts are extracted under<mdx_data_dir>/models.
Compose Operation Stuck or Retried#
Symptom: A deployment appears stuck, repeated deploy prompts fail with “operation already running”, or teardown is needed after a failed docker_up.
Solution:
Poll status with the
docker_statustool using thedocker_compose_ops_idreturned bydocker_upordocker_down.Ask the agent to fetch logs with
docker_logsfor the failing container.To stop a running deployment, ask the agent to tear it down.
docker_downcan preempt a runningdocker_upfor the samedocker_compose_id.Use
deep_clean=trueondocker_downonly when you intentionally want to delete the configuredmdx_data_dirafter a successful teardown.
Sandbox or Gateway in a Bad State#
Symptom: nemoclaw or openshell commands hang or error out, the agent stops responding, the OpenShell gateway container is missing or unhealthy, or VSS skills fail to load after repeated reinstall attempts.
Solution: Destroy the sandbox, garbage-collect leftover state, then recreate it from the notebook.
nemoclaw <sandbox> destroy
nemoclaw gc
Then re-run Section 3 of deploy_nemoclaw_vss.ipynb. This recreates the sandbox, reapplies the VSS policy, reinstalls the VSS OpenClaw plugin and skills, refreshes the OpenClaw config, and restarts the dashboard forward.
Clean Reinstall#
Symptom: Destroying the sandbox and restarting the forward does not restore a working environment.
Solution: Completely uninstall NemoClaw and OpenClaw, then reinstall from the notebook.
Run the bundled NemoClaw uninstaller:
bash ~/NemoClaw/uninstall.shThe uninstaller removes:
All OpenShell sandboxes and the NemoClaw gateway/providers.
NemoClaw helper services.
NemoClaw / OpenShell / OpenClaw Docker images created during onboarding.
~/.nemoclawand~/.config/{openshell,nemoclaw}state.The global
nemoclawnpm install/link and theopenshellbinary.
Docker, Node.js, npm, and Ollama are preserved.
Reinstall by re-running Section 3 of
deploy_nemoclaw_vss.ipynb. Section 3 fetches the pinnedNEMOCLAW_INSTALL_REF, recreates the sandbox, reapplies the VSS policy, and reinstalls the VSS OpenClaw plugin and skills.