Tool-Calling Reliability for Local Inference

View as Markdown

Local inference is useful for privacy, cost control, and offline development, but tool-calling agents place stricter demands on the model server than simple chat. The model server must return structured tool_calls, not a JSON-looking string inside normal assistant text.

Use this page when the TUI shows raw JSON such as:

1{"arguments":{"query":"robotics"},"name":"memory_search"}

If that appears as text in the assistant reply, OpenClaw cannot dispatch the tool because the inference response did not include a structured tool call.

Quick Choice Guide

WorkloadOllama is usually sufficientPrefer vLLM with a parser
Plain chatYesOptional
Embeddings-only or retrieval setupYesOptional
One simple tool with short promptsOftenOptional
Agent loops with several toolsRiskyYes
Long system prompts or sender metadataRiskyYes
Multi-turn tool dispatchRiskyYes

Ollama can work well for lightweight local chat and some simple tool surfaces. For OpenClaw-style agent loops with multiple tools, long instructions, or multi-turn dispatch, use a server that exposes OpenAI-compatible /v1/chat/completions with a tool-call parser. vLLM is the common local choice.

Symptom

The common failure mode is:

  • The model emits text that looks like a tool call.
  • The response does not include a structured tool_calls field.
  • The gateway treats the response as normal text.
  • No tool runs, and the user sees raw JSON in the TUI.

This is different from a network or policy block. nemoclaw <name> status, nemoclaw <name> logs, and nemoclaw debug --quick can all look healthy while tool dispatch still fails inside the conversation.

For persistent NemoClaw use, start vLLM with auto tool choice and the parser that matches your model family, then rerun onboarding and select Local vLLM [experimental] or Other OpenAI-compatible endpoint.

For Hermes 3 style models, a known-good vLLM command shape is:

1$ vllm serve /models/Hermes-3-Llama-3.1-8B \
2 --served-model-name hermes-3-llama-3.1-8b \
3 --enable-auto-tool-choice \
4 --tool-call-parser hermes \
5 --port 8000

For a Docker Compose setup:

1services:
2 vllm-nemoclaw:
3 image: vllm/vllm-openai:latest
4 container_name: vllm-nemoclaw
5 restart: unless-stopped
6 ports:
7 - "8002:8000"
8 volumes:
9 - /path/to/models:/models:ro
10 - /path/to/hf-cache:/root/.cache/huggingface
11 ipc: host
12 deploy:
13 resources:
14 reservations:
15 devices:
16 - capabilities: [gpu]
17 count: all
18 command: >
19 --model /models/Hermes-3-Llama-3.1-8B
20 --served-model-name hermes-3-llama-3.1-8b
21 --enable-auto-tool-choice
22 --tool-call-parser hermes
23 --gpu-memory-utilization 0.20
24 --max-model-len 32768
25 --api-key ${VLLM_API_KEY}

Then onboard against that endpoint:

1$ NEMOCLAW_PROVIDER=custom \
2 NEMOCLAW_ENDPOINT_URL=http://localhost:8002/v1 \
3 NEMOCLAW_MODEL=hermes-3-llama-3.1-8b \
4 COMPATIBLE_API_KEY=$VLLM_API_KEY \
5 nemoclaw onboard --non-interactive

If the endpoint does not require authentication, set COMPATIBLE_API_KEY to any non-empty placeholder, such as dummy.

Advanced Temporary Repointing

NemoClaw-managed sandboxes normally block direct openclaw config set writes inside the sandbox because those edits do not survive rebuilds. Prefer rerunning nemoclaw onboard for a persistent provider change.

If you are intentionally testing a mutable OpenClaw config, prepare a batch file like this:

1{
2 "models": {
3 "providers": {
4 "vllm-local": {
5 "baseUrl": "http://host.openshell.internal:8002/v1",
6 "api": "openai",
7 "apiKey": "${VLLM_API_KEY}"
8 }
9 }
10 },
11 "agents": {
12 "defaults": {
13 "model": {
14 "primary": "vllm-local/hermes-3-llama-3.1-8b"
15 }
16 }
17 }
18}

Apply it only in environments where OpenClaw config writes are allowed:

1$ openclaw config set --batch-file /sandbox/.openclaw/vllm-tool-calls.json

After testing, persist the working provider through nemoclaw onboard so the sandbox image, OpenShell inference route, and host-managed credentials stay in sync.

Verify the Fix

After switching to vLLM, ask for an action that should use a tool. Good signs:

  • The TUI does not show JSON blobs as assistant text.
  • The gateway log shows tool dispatch and a follow-up answer.
  • nemoclaw <name> status reports the local vLLM or compatible endpoint as the active provider.

If JSON still appears as text, confirm that vLLM was started with both --enable-auto-tool-choice and the correct --tool-call-parser value for your model.

Next Steps