Reasoning Parsing (Engine Fallback) | NVIDIA Dynamo Documentation

When Dynamo’s registry does not list a reasoning parser for your model, fall back to the upstream engine’s parser via a chat-processor swap, which keeps frontend tokenization and KV routing.

For Dynamo-native parsers, see Reasoning Parsing (Dynamo). For the equivalent tool-call fallback, see Tool Call Parsing (Engine Fallback).

Known Issue: Engine-fallback reasoning parsing does not currently work with disaggregated serving. Use the Dynamo-native reasoning parser for disaggregated deployments.

Configurations

	Frontend flags	Worker flags	KV routing	Notes
vLLM chat processor	`--dyn-chat-processor vllm --reasoning-parser <name>`	(none)	Yes	Parsing runs in vLLM’s Python preprocessor. See vLLM Chat Processor.
SGLang chat processor	`--dyn-chat-processor sglang --reasoning-parser <name>`	(none)	Yes	Parsing runs in SGLang’s Python preprocessor. See SGLang Chat Processor.

Upstream parser names come from the engine’s registry and may differ from Dynamo’s name for the same model (e.g., vLLM’s nemotron_v3 vs Dynamo’s nemotron3). They are pinned to the engine version shipped in the Dynamo container.

Examples

$ # vLLM chat processor
$ python -m dynamo.vllm ...
$ python -m dynamo.frontend --dyn-chat-processor vllm --reasoning-parser deepseek_r1
$ 
$ # SGLang chat processor
$ python -m dynamo.sglang ...
$ python -m dynamo.frontend --dyn-chat-processor sglang --reasoning-parser kimi_k25

Configurations

Examples

See Also