For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
User Guide
User Guide
    • Home
      • Overview
      • Architecture Overview
      • Ecosystem
      • Release Notes
      • Prerequisites
      • Quickstart with OpenClaw
      • Inference Options
      • Use Local Inference
      • Tool-Calling Reliability
      • Switch Inference Providers
      • Set Up Task-Specific Sub-Agents
      • Manage Sandbox Lifecycle
      • Runtime Controls
      • Set Up Messaging Channels
      • Workspace Files
      • Backup and Restore
      • Install OpenClaw Plugins
      • Sandbox Hardening
      • Approve or Deny Network Requests
      • Customize the Network Policy
      • Integration Policy Examples
      • Deploy to Remote GPU Instances
      • Brev Web UI
      • Monitor Sandbox Activity
      • Security Best Practices
      • Credential Storage
      • OpenClaw Controls
      • Architecture Details
      • Commands
      • Which CLI to Use
      • Network Policies
      • Troubleshooting
      • Agent Skills
      • Report Vulnerabilities
      • License
      • Discord
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNemoClaw
On this page
  • Quick Choice Guide
  • Symptom
  • Recommended Fix
  • Advanced Temporary Repointing
  • Verify the Fix
  • Next Steps
Inference

Tool-Calling Reliability for Local Inference

||View as Markdown|
Previous

Use a Local Inference Server

Next

Switch Inference Models at Runtime

Local inference is useful for privacy, cost control, and offline development, but tool-calling agents place stricter demands on the model server than simple chat. The model server must return structured tool_calls, not a JSON-looking string inside normal assistant text.

Use this page when the TUI shows raw JSON such as:

1{"arguments":{"query":"robotics"},"name":"memory_search"}

If that appears as text in the assistant reply, OpenClaw cannot dispatch the tool because the inference response did not include a structured tool call.

Quick Choice Guide

WorkloadOllama is usually sufficientPrefer vLLM with a parser
Plain chatYesOptional
Embeddings-only or retrieval setupYesOptional
One simple tool with short promptsOftenOptional
Agent loops with several toolsRiskyYes
Long system prompts or sender metadataRiskyYes
Multi-turn tool dispatchRiskyYes

Ollama can work well for lightweight local chat and some simple tool surfaces. For OpenClaw-style agent loops with multiple tools, long instructions, or multi-turn dispatch, use a server that exposes OpenAI-compatible /v1/chat/completions with a tool-call parser. vLLM is the common local choice.

Symptom

The common failure mode is:

  • The model emits text that looks like a tool call.
  • The response does not include a structured tool_calls field.
  • The gateway treats the response as normal text.
  • No tool runs, and the user sees raw JSON in the TUI.

This is different from a network or policy block. nemoclaw <name> status, nemoclaw <name> logs, and nemoclaw debug --quick can all look healthy while tool dispatch still fails inside the conversation.

Recommended Fix

For persistent NemoClaw use, start vLLM with auto tool choice and the parser that matches your model family, then rerun onboarding and select Local vLLM [experimental] or Other OpenAI-compatible endpoint.

For Hermes 3 style models, a known-good vLLM command shape is:

$vllm serve /models/Hermes-3-Llama-3.1-8B \
> --served-model-name hermes-3-llama-3.1-8b \
> --enable-auto-tool-choice \
> --tool-call-parser hermes \
> --port 8000

For a Docker Compose setup:

1services:
2 vllm-nemoclaw:
3 image: vllm/vllm-openai:latest
4 container_name: vllm-nemoclaw
5 restart: unless-stopped
6 ports:
7 - "8002:8000"
8 volumes:
9 - /path/to/models:/models:ro
10 - /path/to/hf-cache:/root/.cache/huggingface
11 ipc: host
12 deploy:
13 resources:
14 reservations:
15 devices:
16 - capabilities: [gpu]
17 count: all
18 command: >
19 --model /models/Hermes-3-Llama-3.1-8B
20 --served-model-name hermes-3-llama-3.1-8b
21 --enable-auto-tool-choice
22 --tool-call-parser hermes
23 --gpu-memory-utilization 0.20
24 --max-model-len 32768
25 --api-key ${VLLM_API_KEY}

Then onboard against that endpoint:

$NEMOCLAW_PROVIDER=custom \
> NEMOCLAW_ENDPOINT_URL=http://localhost:8002/v1 \
> NEMOCLAW_MODEL=hermes-3-llama-3.1-8b \
> COMPATIBLE_API_KEY=$VLLM_API_KEY \
> nemoclaw onboard --non-interactive

If the endpoint does not require authentication, set COMPATIBLE_API_KEY to any non-empty placeholder, such as dummy.

Advanced Temporary Repointing

NemoClaw-managed sandboxes normally block direct openclaw config set writes inside the sandbox because those edits do not survive rebuilds. Prefer rerunning nemoclaw onboard for a persistent provider change.

If you are intentionally testing a mutable OpenClaw config, prepare a batch file like this:

1{
2 "models": {
3 "providers": {
4 "vllm-local": {
5 "baseUrl": "http://host.openshell.internal:8002/v1",
6 "api": "openai",
7 "apiKey": "${VLLM_API_KEY}"
8 }
9 }
10 },
11 "agents": {
12 "defaults": {
13 "model": {
14 "primary": "vllm-local/hermes-3-llama-3.1-8b"
15 }
16 }
17 }
18}

Apply it only in environments where OpenClaw config writes are allowed:

$openclaw config set --batch-file /sandbox/.openclaw/vllm-tool-calls.json

After testing, persist the working provider through nemoclaw onboard so the sandbox image, OpenShell inference route, and host-managed credentials stay in sync.

Verify the Fix

After switching to vLLM, ask for an action that should use a tool. Good signs:

  • The TUI does not show JSON blobs as assistant text.
  • The gateway log shows tool dispatch and a follow-up answer.
  • nemoclaw <name> status reports the local vLLM or compatible endpoint as the active provider.

If JSON still appears as text, confirm that vLLM was started with both --enable-auto-tool-choice and the correct --tool-call-parser value for your model.

Next Steps

  • Use a Local Inference Server
  • Inference Options
  • Switch Inference Models