Tool-Calling Reliability for Local Inference
Tool-Calling Reliability for Local Inference
Local inference is useful for privacy, cost control, and offline development, but
tool-calling agents place stricter demands on the model server than simple chat.
The model server must return structured tool_calls, not a JSON-looking string
inside normal assistant text.
Use this page when the TUI shows raw JSON such as:
If that appears as text in the assistant reply, OpenClaw cannot dispatch the tool because the inference response did not include a structured tool call.
Quick Choice Guide
Ollama can work well for lightweight local chat and some simple tool surfaces.
For OpenClaw-style agent loops with multiple tools, long instructions, or
multi-turn dispatch, use a server that exposes OpenAI-compatible
/v1/chat/completions with a tool-call parser. vLLM is the common local choice.
Symptom
The common failure mode is:
- The model emits text that looks like a tool call.
- The response does not include a structured
tool_callsfield. - The gateway treats the response as normal text.
- No tool runs, and the user sees raw JSON in the TUI.
This is different from a network or policy block. nemoclaw <name> status,
nemoclaw <name> logs, and nemoclaw debug --quick can all look healthy while
tool dispatch still fails inside the conversation.
Recommended Fix
For persistent NemoClaw use, start vLLM with auto tool choice and the parser that matches your model family, then rerun onboarding and select Local vLLM [experimental] or Other OpenAI-compatible endpoint.
For Hermes 3 style models, a known-good vLLM command shape is:
For a Docker Compose setup:
Then onboard against that endpoint:
If the endpoint does not require authentication, set COMPATIBLE_API_KEY to any
non-empty placeholder, such as dummy.
Advanced Temporary Repointing
NemoClaw-managed sandboxes normally block direct openclaw config set writes
inside the sandbox because those edits do not survive rebuilds. Prefer rerunning
nemoclaw onboard for a persistent provider change.
If you are intentionally testing a mutable OpenClaw config, prepare a batch file like this:
Apply it only in environments where OpenClaw config writes are allowed:
After testing, persist the working provider through nemoclaw onboard so the
sandbox image, OpenShell inference route, and host-managed credentials stay in
sync.
Verify the Fix
After switching to vLLM, ask for an action that should use a tool. Good signs:
- The TUI does not show JSON blobs as assistant text.
- The gateway log shows tool dispatch and a follow-up answer.
nemoclaw <name> statusreports the local vLLM or compatible endpoint as the active provider.
If JSON still appears as text, confirm that vLLM was started with both
--enable-auto-tool-choice and the correct --tool-call-parser value for your
model.