For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
  • User Guides
    • Disaggregated Serving
    • KV Cache Aware Routing
    • KV Cache Offloading
    • Tool Calling
      • Tool Call Parsing (Dynamo)
      • Tool Call Parsing (Engine Fallback)
      • Troubleshooting Tool Calls
    • Reasoning
    • Agents
    • Multimodal
    • Diffusion
    • LoRA Adapters
    • Observability (Local)
    • Fault Tolerance
    • Benchmarking
    • Writing Python Workers
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • The request
  • The response
  • What to include when reporting an issue
  • See also
User GuidesTool Calling

Troubleshooting Tool Calls

Capture raw model output with logprobs so issues can be localized
||View as Markdown|
Previous

Tool Call Parsing (Engine Fallback)

Next

Reasoning

When a tool call comes back wrong (tool_calls is null, the arguments look malformed, raw <tool_call> markers appear in message.content, or finish_reason is "stop" when you expected "tool_calls"), the request and response alone usually do not say where the bug is. The model and the parser produce indistinguishable failures from the response side.

Adding "logprobs": true to a single repro request makes the engine’s raw token output visible in the response. That is enough for someone on the Dynamo team to identify whether the issue is in the model, the parser configuration, or the parser itself. This page shows the field to add and what the response will look like, so you can capture and share useful diagnostic info.

Recipe applies to non-streaming requests against Dynamo’s OpenAI /v1/chat/completions endpoint. For multi-channel reasoning models (harmony, kimi_k2, kimi_k25, gemma4), the recipe recovers only the assistant-content channel; the reasoning channel is not surfaced in logprobs.content.

If the worker is the SGLang backend, logprobs: true is rejected by default because SGLang’s tokenizer manager detokenizes top-k tokens serially, causing latency degradation. Launch the worker with DYN_SGL_ALLOW_TOP_LOGPROBS=1 set in the environment to opt in for the duration of the repro request, then unset it afterward. Tracked at sgl-project/sglang#24447.

The request

Add "logprobs": true to your failing request:

$curl -s http://localhost:8000/v1/chat/completions \
> -H 'Content-Type: application/json' \
> -d '{
> "model": "Qwen/Qwen2.5-7B-Instruct",
> "messages": [
> {"role": "user", "content": "What is the weather in NYC?"}
> ],
> "tools": [{
> "type": "function",
> "function": {
> "name": "get_weather",
> "parameters": {
> "type": "object",
> "properties": {
> "location": {"type": "string"},
> "unit": {"enum": ["celsius", "fahrenheit"]}
> },
> "required": ["location"]
> }
> }
> }],
> "tool_choice": "auto",
> "temperature": 0.0,
> "logprobs": true
> }'

The response

You will get back the usual fields (message.tool_calls, message.content, finish_reason) plus a new choices[0].logprobs.content field carrying the engine’s raw token stream:

1{
2 "choices": [{
3 "finish_reason": "tool_calls",
4 "message": {
5 "role": "assistant",
6 "content": null,
7 "tool_calls": [{
8 "type": "function",
9 "function": {
10 "name": "get_weather",
11 "arguments": "{\"location\":\"New York, NY\",\"unit\":\"fahrenheit\"}"
12 }
13 }]
14 },
15 "logprobs": {
16 "content": [
17 {"token": "<tool_call>", "bytes": [60, 116, 111, 111, 108, 95, 99, 97, 108, 108, 62]},
18 {"token": "\n", "bytes": [10]},
19 {"token": "{\"", "bytes": [123, 34]},
20 {"token": "name", "bytes": [110, 97, 109, 101]},
21 "...",
22 {"token": "</tool_call>", "bytes": [60, 47, 116, 111, 111, 108, 95, 99, 97, 108, 108, 62]}
23 ]
24 }
25 }]
26}

Each entry in logprobs.content is one generated token with its exact UTF-8 bytes. Concatenating those bytes in order reconstructs the raw model output, before any tool-call parser touched it. That is the key piece for triage: it tells us what the model actually produced, separately from what the parser made of it.

What to include when reporting an issue

Share these four things in the bug report or issue thread:

  1. The full request body (model name, messages, tools, sampling params, and logprobs: true).
  2. The full response body. Do not truncate logprobs.content — the per-token entries are the part that matters.
  3. The Dynamo version and the backend (vLLM, SGLang, TRT-LLM, including versions if known).
  4. The worker launch command, especially the --dyn-tool-call-parser value if set.

With those four pieces, the Dynamo team can usually localize the bug without standing up your model. The team will reconstruct the raw stream from the bytes arrays and compare it against message.content and message.tool_calls to decide whether the issue is in the model output, the parser configuration, or the parser logic.

See also

  • Tool Call Parsing (Dynamo) — Dynamo-native parser names and request examples
  • Tool Call Parsing (Engine Fallback) — --dyn-chat-processor fallback path to vLLM and SGLang parsers
  • Frontend Configuration Reference — full CLI flag reference for the frontend and worker