Parser Engine Fallback | NVIDIA Dynamo Documentation

When Dynamo’s registry does not list a tool-call or reasoning parser for your model, fall back to the upstream engine’s parser via a chat-processor swap, which keeps frontend tokenization and KV routing.

For the Dynamo-native default path, see Tool Call Parsing (Dynamo) and Reasoning Parsing (Dynamo).

How --dyn-chat-processor combines with the parser flags — and which combinations are invalid (engine fallback supports disaggregated serving on vLLM and SGLang; TRT-LLM engine fallback is a work in progress) — is documented once in Parser Configuration. Read that first; this page covers only the engine-fallback specifics.

Configuration

Engine fallback runs parsing in the engine’s own Python frontend. Select it with --dyn-chat-processor vllm or sglang, then name the parser with the engine’s frontend flags:

--tool-call-parser <name> — the engine’s tool-call parser
--reasoning-parser <name> — the engine’s reasoning parser

These are distinct from the Dynamo-native --dyn-tool-call-parser / --dyn-reasoning-parser (which go on the worker). The accepted names come from the engine’s registry and may differ from Dynamo’s — e.g. vLLM nemotron_v3 vs Dynamo nemotron3, SGLang deepseekv3 vs Dynamo deepseek_v3.

Examples

$ # vLLM chat processor — frontend carries the parser flags, then launch the worker:
$ python -m dynamo.frontend --dyn-chat-processor vllm   --tool-call-parser hermes --reasoning-parser qwen3
$ python -m dynamo.vllm   --model Qwen/Qwen3-0.6B
$ 
$ # SGLang chat processor
$ python -m dynamo.frontend --dyn-chat-processor sglang --tool-call-parser qwen25 --reasoning-parser qwen3
$ python -m dynamo.sglang --model Qwen/Qwen3-0.6B

If a tool call or reasoning split comes back wrong, add "logprobs": true to a single repro request and share the response. See Troubleshooting Tool Calls for what to capture.

Configuration

Examples

See Also