For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes
    • DynoSim: Simulating the Pareto Frontier
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
    • API Reference
  • User Guides
    • Disaggregated Serving
    • KV Cache Aware Routing
    • KV Cache Offloading
    • Tool Call and Reasoning Parsing
      • Tool Call Parsing (Dynamo)
      • Reasoning Parsing (Dynamo)
      • Parser Engine Fallback
      • Parser Configuration
      • Troubleshooting Tool Calls
    • Agents
    • Multimodal
    • Diffusion
    • LoRA Adapters
    • Fastokens Tokenizer
    • Observability (Local)
    • Fault Tolerance
    • Benchmarking
    • Writing Python Workers
    • Writing Python Unified Backends
    • Writing Rust Unified Backends
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Configuration
  • Examples
  • See Also
User GuidesTool Call and Reasoning Parsing

Parser Engine Fallback

Use upstream vLLM or SGLang tool-call and reasoning parsers when Dynamo does not ship one

||View as Markdown|
Previous

Reasoning Parsing (Dynamo)

Next

Parser Configuration

When Dynamo’s registry does not list a tool-call or reasoning parser for your model, fall back to the upstream engine’s parser via a chat-processor swap, which keeps frontend tokenization and KV routing.

For the Dynamo-native default path, see Tool Call Parsing (Dynamo) and Reasoning Parsing (Dynamo).

How --dyn-chat-processor combines with the parser flags — and which combinations are invalid (engine fallback supports disaggregated serving on vLLM and SGLang; TRT-LLM engine fallback is a work in progress) — is documented once in Parser Configuration. Read that first; this page covers only the engine-fallback specifics.

Configuration

Engine fallback runs parsing in the engine’s own Python frontend. Select it with --dyn-chat-processor vllm or sglang, then name the parser with the engine’s frontend flags:

  • --tool-call-parser <name> — the engine’s tool-call parser
  • --reasoning-parser <name> — the engine’s reasoning parser

These are distinct from the Dynamo-native --dyn-tool-call-parser / --dyn-reasoning-parser (which go on the worker). The accepted names come from the engine’s registry and may differ from Dynamo’s — e.g. vLLM nemotron_v3 vs Dynamo nemotron3, SGLang deepseekv3 vs Dynamo deepseek_v3.

Examples

$# vLLM chat processor — frontend carries the parser flags, then launch the worker:
$python -m dynamo.frontend --dyn-chat-processor vllm --tool-call-parser hermes --reasoning-parser qwen3
$python -m dynamo.vllm --model Qwen/Qwen3-0.6B
$
$# SGLang chat processor
$python -m dynamo.frontend --dyn-chat-processor sglang --tool-call-parser qwen25 --reasoning-parser qwen3
$python -m dynamo.sglang --model Qwen/Qwen3-0.6B

If a tool call or reasoning split comes back wrong, add "logprobs": true to a single repro request and share the response. See Troubleshooting Tool Calls for what to capture.

See Also

  • Parser Configuration — how the chat-processor and parser flags combine, and which combinations are invalid (start here)
  • Tool Call Parsing (Dynamo) — Dynamo-native tool-call parser names
  • Reasoning Parsing (Dynamo) — Dynamo-native reasoning parser names
  • vLLM Chat Processor — vLLM chat-processor details
  • SGLang Chat Processor — SGLang chat-processor details
  • Frontend Configuration Reference — Full CLI flag reference