For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes
    • DynoSim: Simulating the Pareto Frontier
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
    • API Reference
  • User Guides
    • Disaggregated Serving
    • KV Cache Aware Routing
    • KV Cache Offloading
    • Tool Call and Reasoning Parsing
      • Tool Call Parsing (Dynamo)
      • Reasoning Parsing (Dynamo)
      • Parser Engine Fallback
      • Parser Configuration
      • Troubleshooting Tool Calls
    • Agents
    • Multimodal
    • Diffusion
    • LoRA Adapters
    • Fastokens Tokenizer
    • Observability (Local)
    • Fault Tolerance
    • Benchmarking
    • Writing Python Workers
    • Writing Python Unified Backends
    • Writing Rust Unified Backends
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Choose a parsing path
  • Why Dynamo parses in the frontend
  • See Also
User Guides

Tool Call and Reasoning Parsing

Parse tool calls and reasoning out of model output into OpenAI-compatible tool_calls and reasoning_content

||View as Markdown|
Previous

KVBM Guide

Next

Tool Call Parsing (Dynamo)

Dynamo parses tool-call and reasoning markup out of raw model output and surfaces it as OpenAI-compatible tool_calls and reasoning_content on the response. Tool calling is controlled by the tool_choice and tools request parameters; reasoning parsing is enabled per-model with a reasoning parser.

There are two ways to parse, depending on whether the parser lives in Dynamo’s own registry or in an upstream engine frontend (vllm serve, sglang serve, or trtllm-serve).

Choose a parsing path

PathWhen to usePages
DynamoDynamo ships a framework-agnostic Rust parser for the model’s tool-call or reasoning format. Default path.Tool Call Parsing (Dynamo), Reasoning Parsing (Dynamo)
Engine FallbackUse the framework’s own parser (vLLM or SGLang today; TRT-LLM in progress) when Dynamo doesn’t ship one for your model.Parser Engine Fallback

Start with the Dynamo path. Fall back to the engine path only when Dynamo’s registry doesn’t list a parser for your model. For exactly which flags combine and which combinations don’t make sense, see Parser Configuration.

Why Dynamo parses in the frontend

In vllm serve, sglang serve, and trtllm-serve, tool-call and reasoning parsing happen in each engine’s own frontend, with subtle behavioral differences across them. For performance, Dynamo orchestrates routing and tokenization and passes tokens directly to each engine, bypassing the engine’s OpenAI server to avoid duplicate work per request. So Dynamo implements parsing in its frontend as a framework-agnostic Rust layer — one tested OpenAI-compatible contract across vLLM, SGLang, and TRT-LLM, on a hot path that stays concurrent without a Python GIL bottleneck. The vllm/sglang chat processors (engine fallback) opt back into the engine’s own parser when Dynamo doesn’t ship one for your model.

See Also

  • Parser Configuration — which flags combine, and which combinations don’t make sense
  • Tool Call Parsing (Dynamo) / Reasoning Parsing (Dynamo) — Dynamo-native parser names
  • Parser Engine Fallback — upstream vLLM / SGLang parsers
  • Troubleshooting Tool Calls — capture logprobs so issues can be localized
  • Frontend Configuration Reference — full CLI flag reference