Troubleshooting Tool Calls

Capture raw model output with logprobs so issues can be localized
View as Markdown

When a tool call comes back wrong (tool_calls is null, the arguments look malformed, raw <tool_call> markers appear in message.content, or finish_reason is "stop" when you expected "tool_calls"), the request and response alone usually do not say where the bug is. The model and the parser produce indistinguishable failures from the response side.

Adding "logprobs": true to a single repro request makes the engine’s raw token output visible in the response. That is enough for someone on the Dynamo team to identify whether the issue is in the model, the parser configuration, or the parser itself. This page shows the field to add and what the response will look like, so you can capture and share useful diagnostic info.

Recipe applies to non-streaming requests against Dynamo’s OpenAI /v1/chat/completions endpoint. For multi-channel reasoning models (harmony, kimi_k2, kimi_k25, gemma4), the recipe recovers only the assistant-content channel; the reasoning channel is not surfaced in logprobs.content.

If the worker is the SGLang backend, logprobs: true is rejected by default because SGLang’s tokenizer manager detokenizes top-k tokens serially, causing latency degradation. Launch the worker with DYN_SGL_ALLOW_TOP_LOGPROBS=1 set in the environment to opt in for the duration of the repro request, then unset it afterward. Tracked at sgl-project/sglang#24447.

The request

Add "logprobs": true to your failing request:

$curl -s http://localhost:8000/v1/chat/completions \
> -H 'Content-Type: application/json' \
> -d '{
> "model": "Qwen/Qwen2.5-7B-Instruct",
> "messages": [
> {"role": "user", "content": "What is the weather in NYC?"}
> ],
> "tools": [{
> "type": "function",
> "function": {
> "name": "get_weather",
> "parameters": {
> "type": "object",
> "properties": {
> "location": {"type": "string"},
> "unit": {"enum": ["celsius", "fahrenheit"]}
> },
> "required": ["location"]
> }
> }
> }],
> "tool_choice": "auto",
> "temperature": 0.0,
> "logprobs": true
> }'

The response

You will get back the usual fields (message.tool_calls, message.content, finish_reason) plus a new choices[0].logprobs.content field carrying the engine’s raw token stream:

1{
2 "choices": [{
3 "finish_reason": "tool_calls",
4 "message": {
5 "role": "assistant",
6 "content": null,
7 "tool_calls": [{
8 "type": "function",
9 "function": {
10 "name": "get_weather",
11 "arguments": "{\"location\":\"New York, NY\",\"unit\":\"fahrenheit\"}"
12 }
13 }]
14 },
15 "logprobs": {
16 "content": [
17 {"token": "<tool_call>", "bytes": [60, 116, 111, 111, 108, 95, 99, 97, 108, 108, 62]},
18 {"token": "\n", "bytes": [10]},
19 {"token": "{\"", "bytes": [123, 34]},
20 {"token": "name", "bytes": [110, 97, 109, 101]},
21 "...",
22 {"token": "</tool_call>", "bytes": [60, 47, 116, 111, 111, 108, 95, 99, 97, 108, 108, 62]}
23 ]
24 }
25 }]
26}

Each entry in logprobs.content is one generated token with its exact UTF-8 bytes. Concatenating those bytes in order reconstructs the raw model output, before any tool-call parser touched it. That is the key piece for triage: it tells us what the model actually produced, separately from what the parser made of it.

What to include when reporting an issue

Share these four things in the bug report or issue thread:

  1. The full request body (model name, messages, tools, sampling params, and logprobs: true).
  2. The full response body. Do not truncate logprobs.content — the per-token entries are the part that matters.
  3. The Dynamo version and the backend (vLLM, SGLang, TRT-LLM, including versions if known).
  4. The worker launch command, especially the --dyn-tool-call-parser value if set.

With those four pieces, the Dynamo team can usually localize the bug without standing up your model. The team will reconstruct the raw stream from the bytes arrays and compare it against message.content and message.tool_calls to decide whether the issue is in the model output, the parser configuration, or the parser logic.

See also