Tool-Calling Reliability for Local Inference#

Local inference is useful for privacy, cost control, and offline development, but tool-calling agents place stricter demands on the model server than simple chat. The model server must return structured tool_calls, not a JSON-looking string inside normal assistant text.

Use this page when the TUI shows raw JSON such as:

{"arguments":{"query":"robotics"},"name":"memory_search"}

If that appears as text in the assistant reply, OpenClaw cannot dispatch the tool because the inference response did not include a structured tool call.

Quick Choice Guide#

Workload

Ollama is usually sufficient

Prefer vLLM with a parser

Plain chat

Yes

Optional

Embeddings-only or retrieval setup

Yes

Optional

One simple tool with short prompts

Often

Optional

Agent loops with several tools

Risky

Yes

Long system prompts or sender metadata

Risky

Yes

Multi-turn tool dispatch

Risky

Yes

Ollama can work well for lightweight local chat and some simple tool surfaces. For OpenClaw-style agent loops with multiple tools, long instructions, or multi-turn dispatch, use a server that exposes OpenAI-compatible /v1/chat/completions with a tool-call parser. vLLM is the common local choice.

Symptom#

The common failure mode is:

  • The model emits text that looks like a tool call.

  • The response does not include a structured tool_calls field.

  • The gateway treats the response as normal text.

  • No tool runs, and the user sees raw JSON in the TUI.

This is different from a network or policy block. nemoclaw <name> status, nemoclaw <name> logs, and nemoclaw debug --quick can all look healthy while tool dispatch still fails inside the conversation.

Advanced Temporary Repointing#

NemoClaw-managed sandboxes normally block direct openclaw config set writes inside the sandbox because those edits do not survive rebuilds. Prefer rerunning nemoclaw onboard for a persistent provider change.

If you are intentionally testing a mutable OpenClaw config, prepare a batch file like this:

{
  "models": {
    "providers": {
      "vllm-local": {
        "baseUrl": "http://host.openshell.internal:8002/v1",
        "api": "openai",
        "apiKey": "${VLLM_API_KEY}"
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "vllm-local/hermes-3-llama-3.1-8b"
      }
    }
  }
}

Apply it only in environments where OpenClaw config writes are allowed:

$ openclaw config set --batch-file /sandbox/.openclaw/vllm-tool-calls.json

After testing, persist the working provider through nemoclaw onboard so the sandbox image, OpenShell inference route, and host-managed credentials stay in sync.

Verify the Fix#

After switching to vLLM, ask for an action that should use a tool. Good signs:

  • The TUI does not show JSON blobs as assistant text.

  • The gateway log shows tool dispatch and a follow-up answer.

  • nemoclaw <name> status reports the local vLLM or compatible endpoint as the active provider.

If JSON still appears as text, confirm that vLLM was started with both --enable-auto-tool-choice and the correct --tool-call-parser value for your model.

Next Steps#