For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes
    • DynoSim: Simulating the Pareto Frontier
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
    • API Reference
  • User Guides
    • Disaggregated Serving
    • KV Cache Aware Routing
    • KV Cache Offloading
    • Tool Call and Reasoning Parsing
      • Tool Call Parsing (Dynamo)
      • Reasoning Parsing (Dynamo)
      • Parser Engine Fallback
      • Parser Configuration
      • Troubleshooting Tool Calls
    • Agents
    • Multimodal
    • Diffusion
    • LoRA Adapters
    • Fastokens Tokenizer
    • Observability (Local)
    • Fault Tolerance
    • Benchmarking
    • Writing Python Workers
    • Writing Python Unified Backends
    • Writing Rust Unified Backends
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Prerequisites
  • Supported Reasoning Parsers
  • Common Parser Pairings
  • Tool Calling Interplay
  • Examples
  • Launch Dynamo Frontend and Backend
  • Reasoning Request Example
User GuidesTool Call and Reasoning Parsing

Reasoning Parsing (Dynamo)

Configure Dynamo’s built-in reasoning parsers for models that emit thinking content

||View as Markdown|
Previous

Tool Call Parsing (Dynamo)

Next

Parser Engine Fallback

Some models emit reasoning or thinking content separately from their final response. Dynamo can split that output into reasoning_content and normal assistant content by configuring --dyn-reasoning-parser on the backend worker.

This page covers parser names for the default Dynamo-native path. If Dynamo does not list a parser for your model, see Parser Engine Fallback. For how --dyn-reasoning-parser combines with --dyn-chat-processor and --dyn-tool-call-parser (and which combinations are invalid), see Parser Configuration.

Prerequisites

To enable reasoning parsing, launch the backend worker with:

  • --dyn-reasoning-parser: select the reasoning parser from the supported list below
$# <backend> can be sglang, trtllm, vllm, etc. based on your installation
$python -m dynamo.<backend> --help

Some models need both a reasoning parser and a tool call parser. For supported tool call parser names, see Tool Call Parsing (Dynamo).

Supported Reasoning Parsers

The table below lists the currently supported reasoning parsers in Dynamo’s registry. The Upstream name column shows where the vLLM or SGLang parser name differs from Dynamo’s — relevant when using --dyn-chat-processor vllm or sglang (see Parser Engine Fallback). A blank upstream column means the same name works everywhere. Dynamo-only means no upstream parser exists for this format.

Parsers marked force-reasoning emit reasoning content from token one without requiring an explicit opening tag (<think>, etc.). All others require the opening tag to be present in the model output.

Parser NameModelsUpstream nameForce-reasoningNotes
kimi_k25Kimi K2.5 / Kimi K2.6 format-compatible thinking modelsDynamo-onlyYes<think>...</think> with force-reasoning
kimiKimi K2 Instruct / Thinking with Unicode delimitersDynamo-onlyNo◁think▷...◁/think▷
minimax_append_thinkMiniMax M2 / M2.1Dynamo-onlyNoImplicit opening <think> prepended
deepseek_v4DeepSeek V4 Pro / FlashvLLM: deepseek_v4; SGLang: deepseek-v4No<think>...</think>. Aliases: deepseek-v4, deepseekv4
deepseek_r1DeepSeek R1, DeepSeek V3.1, DeepSeek V3.2YesPass explicitly for V3.1/V3.2 (no alias)
qwen3Qwen3.5, QwQ-32B, Qwen3-Think, Qwen3-CoderNo<think>...</think>
glm45GLM-4.5, GLM-4.7Dynamo-onlyNoAlias for nemotron_deci. <think>...</think>
nemotron3Nemotron-3 / MinivLLM: nemotron_v3YesAlias for deepseek_r1. Also accepts nemotron_v3
nemotron_deciNemotron-Super / -Ultra / -Deci, Llama-NemotronDynamo-onlyNo<think>...</think>
nemotron_nanoNemotron-NanoDynamo-onlyYesAlias for deepseek_r1
gemma4Google Gemma 4 (thinking models)vLLM: gemma4No<|channel>thought\n...<channel|> with thought\n role label stripped. Aliases: gemma-4
gpt_ossgpt-oss-20b / -120bDynamo-onlyNoHarmony channel reasoning format
mistralMagistralYes[THINK]...[/THINK]
graniteIBM Granite 3.x / Granite 3.2 language modelsNoHere's my thought process: / Here's my response:
step3Step-3 / Step-3-ReasoningDynamo-onlyYes<think>...</think>
basicGeneric CoT modelsDynamo-onlyNoPlain <think>...</think>

Common Parser Pairings

Some models need both parsers configured together. Common pairings include:

  • openai/gpt-oss-*: --dyn-tool-call-parser harmony --dyn-reasoning-parser gpt_oss
  • deepseek-ai/DeepSeek-V4-*: --dyn-tool-call-parser deepseek_v4 --dyn-reasoning-parser deepseek_v4
  • zai-org/GLM-4.7: --dyn-tool-call-parser glm47 --dyn-reasoning-parser glm45
  • moonshotai/Kimi-K2.5* / Kimi K2.6 format-compatible outputs: --dyn-tool-call-parser kimi_k2 --dyn-reasoning-parser kimi_k25
  • google/gemma-4-* thinking models: --dyn-tool-call-parser gemma4 --dyn-reasoning-parser gemma4 --custom-jinja-template examples/chat_templates/gemma4_tool.jinja
  • Qwen/Qwen3.5*: --dyn-tool-call-parser qwen3_coder --dyn-reasoning-parser qwen3
  • MiniMax M2.1 style outputs: --dyn-tool-call-parser minimax_m2 --dyn-reasoning-parser minimax_append_think

Tool Calling Interplay

Reasoning parsing happens before tool call parsing. If a model emits both reasoning content and tool calls, configure both parsers so Dynamo can first separate reasoning text and then parse tool calls from the remaining assistant output.

Examples

Launch Dynamo Frontend and Backend

$# launch backend worker (or dynamo.vllm)
$python -m dynamo.sglang --model Qwen/Qwen3.5-4B --dyn-tool-call-parser qwen3_coder --dyn-reasoning-parser qwen3
$
$# launch frontend worker
$python -m dynamo.frontend

Reasoning Request Example

$curl -s http://localhost:8000/v1/chat/completions \
> -H 'Content-Type: application/json' \
> -d '{
> "model": "Qwen/Qwen3.5-4B",
> "messages": [{"role": "user", "content": "If a train leaves at 3pm going 60 mph and another leaves at 4pm going 80 mph, when does the second catch up?"}]
> }'

Dynamo splits the model output so the chain-of-thought lands in reasoning_content and the user-facing answer stays in content:

1{
2 "choices": [
3 {
4 "index": 0,
5 "message": {
6 "role": "assistant",
7 "reasoning_content": "The first train has a 1-hour head start at 60 mph, so it is 60 miles ahead at 4pm. The second train closes the gap at 80 - 60 = 20 mph. 60 / 20 = 3 hours after 4pm.",
8 "content": "The second train catches up at 7pm."
9 },
10 "finish_reason": "stop"
11 }
12 ]
13}