Tool-Calling Reliability for Local Inference#
Local inference is useful for privacy, cost control, and offline development, but
tool-calling agents place stricter demands on the model server than simple chat.
The model server must return structured tool_calls, not a JSON-looking string
inside normal assistant text.
Use this page when the TUI shows raw JSON such as:
{"arguments":{"query":"robotics"},"name":"memory_search"}
If that appears as text in the assistant reply, OpenClaw cannot dispatch the tool because the inference response did not include a structured tool call.
Quick Choice Guide#
Workload |
Ollama is usually sufficient |
Prefer vLLM with a parser |
|---|---|---|
Plain chat |
Yes |
Optional |
Embeddings-only or retrieval setup |
Yes |
Optional |
One simple tool with short prompts |
Often |
Optional |
Agent loops with several tools |
Risky |
Yes |
Long system prompts or sender metadata |
Risky |
Yes |
Multi-turn tool dispatch |
Risky |
Yes |
Ollama can work well for lightweight local chat and some simple tool surfaces.
For OpenClaw-style agent loops with multiple tools, long instructions, or
multi-turn dispatch, use a server that exposes OpenAI-compatible
/v1/chat/completions with a tool-call parser. vLLM is the common local choice.
Symptom#
The common failure mode is:
The model emits text that looks like a tool call.
The response does not include a structured
tool_callsfield.The gateway treats the response as normal text.
No tool runs, and the user sees raw JSON in the TUI.
This is different from a network or policy block. nemoclaw <name> status,
nemoclaw <name> logs, and nemoclaw debug --quick can all look healthy while
tool dispatch still fails inside the conversation.
Recommended Fix#
For persistent NemoClaw use, start vLLM with auto tool choice and the parser that matches your model family, then rerun onboarding and select Local vLLM [experimental] or Other OpenAI-compatible endpoint.
For Hermes 3 style models, a known-good vLLM command shape is:
$ vllm serve /models/Hermes-3-Llama-3.1-8B \
--served-model-name hermes-3-llama-3.1-8b \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--port 8000
For a Docker Compose setup:
services:
vllm-nemoclaw:
image: vllm/vllm-openai:latest
container_name: vllm-nemoclaw
restart: unless-stopped
ports:
- "8002:8000"
volumes:
- /path/to/models:/models:ro
- /path/to/hf-cache:/root/.cache/huggingface
ipc: host
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
count: all
command: >
--model /models/Hermes-3-Llama-3.1-8B
--served-model-name hermes-3-llama-3.1-8b
--enable-auto-tool-choice
--tool-call-parser hermes
--gpu-memory-utilization 0.20
--max-model-len 32768
--api-key ${VLLM_API_KEY}
Then onboard against that endpoint:
$ NEMOCLAW_PROVIDER=custom \
NEMOCLAW_ENDPOINT_URL=http://localhost:8002/v1 \
NEMOCLAW_MODEL=hermes-3-llama-3.1-8b \
COMPATIBLE_API_KEY=$VLLM_API_KEY \
nemoclaw onboard --non-interactive
If the endpoint does not require authentication, set COMPATIBLE_API_KEY to any
non-empty placeholder, such as dummy.
Advanced Temporary Repointing#
NemoClaw-managed sandboxes normally block direct openclaw config set writes
inside the sandbox because those edits do not survive rebuilds. Prefer rerunning
nemoclaw onboard for a persistent provider change.
If you are intentionally testing a mutable OpenClaw config, prepare a batch file like this:
{
"models": {
"providers": {
"vllm-local": {
"baseUrl": "http://host.openshell.internal:8002/v1",
"api": "openai",
"apiKey": "${VLLM_API_KEY}"
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "vllm-local/hermes-3-llama-3.1-8b"
}
}
}
}
Apply it only in environments where OpenClaw config writes are allowed:
$ openclaw config set --batch-file /sandbox/.openclaw/vllm-tool-calls.json
After testing, persist the working provider through nemoclaw onboard so the
sandbox image, OpenShell inference route, and host-managed credentials stay in
sync.
Verify the Fix#
After switching to vLLM, ask for an action that should use a tool. Good signs:
The TUI does not show JSON blobs as assistant text.
The gateway log shows tool dispatch and a follow-up answer.
nemoclaw <name> statusreports the local vLLM or compatible endpoint as the active provider.
If JSON still appears as text, confirm that vLLM was started with both
--enable-auto-tool-choice and the correct --tool-call-parser value for your
model.