Chat Processor Options

Choose the right preprocessing pipeline for tool calling, reasoning, and tokenization
View as Markdown

Dynamo splits work between a frontend process (HTTP server, tokenization, routing, parsing) and one or more worker processes (the engine running the model). Several CLI flags control which code path handles chat template rendering, tool-call parsing, and reasoning-content separation. This page explains the available configurations, when to use each, and how they interact with KV cache routing.

For the list of individual parser names, see Tool Calling and Reasoning.

Configurations

There are five supported configurations. Each is set at startup — Dynamo does not switch between them per request.

Frontend flagsWorker flagsKV routingNotes
A Dynamo-native (default)--dyn-chat-processor dynamo--dyn-tool-call-parser <name> --dyn-reasoning-parser <name>YesRust preprocessor. Lowest latency.
B vLLM chat processor--dyn-chat-processor vllm --tool-call-parser <name> --reasoning-parser <name>(none)YesDelegates to vLLM’s Python preprocessor.
C SGLang chat processor--dyn-chat-processor sglang --tool-call-parser <name> --reasoning-parser <name>(none)YesDelegates to SGLang’s Python preprocessor. See SGLang Chat Processor.
D vLLM tokenizer delegation--router-mode round-robin--use-vllm-tokenizerNoEngine-side tokenization. Day-0 model fallback.
E SGLang tokenizer delegation--router-mode round-robin--use-sglang-tokenizerNoDeprecated — use option C instead.

Although dynamo is the default for --dyn-chat-processor, specifying it explicitly in launch scripts makes the choice visible in logs and support diagnostics.

Flag reference

--dyn-chat-processor {dynamo | vllm | sglang}

Frontend flag (default dynamo). Selects the chat processor that renders templates, tokenizes, and dispatches parsing.

  • dynamo — Rust preprocessor. Parser names come from Dynamo’s registry (see Tool Calling and Reasoning).
  • vllm — vLLM’s Python preprocessor. Parser names come from vLLM’s registry, which may differ from Dynamo’s.
  • sglang — SGLang’s Python preprocessor. Parser names come from SGLang’s registry. See SGLang Chat Processor.

--dyn-tool-call-parser <name> / --dyn-reasoning-parser <name>

Worker flags. Names from Dynamo’s parser registry. Only effective under --dyn-chat-processor dynamo (option A); silently ignored under other chat processors.

The flags are declared on the worker CLI, but the parser runs on the frontend — the name propagates via model metadata. For supported names, see Tool Calling and Reasoning.

--tool-call-parser <name> / --reasoning-parser <name>

Frontend flags (no --dyn- prefix). Names from the upstream engine’s registry. Only accepted when paired with the matching chat processor:

  • Under --dyn-chat-processor vllm: accepted. Use vLLM parser names.
  • Under --dyn-chat-processor sglang: accepted. Use SGLang parser names.
  • Under --dyn-chat-processor dynamo: rejected at startup with Unknown arguments specified: .... Use the --dyn-* worker flags instead.

Upstream parser names are pinned to the engine version shipped in the Dynamo container. They may differ from Dynamo’s names for the same model (e.g., SGLang uses deepseekv3 where Dynamo uses deepseek_v3).

--use-vllm-tokenizer / --use-sglang-tokenizer

Worker flags (boolean). Hand tokenization to the engine instead of the frontend. The flag must match the engine on the worker.

--use-sglang-tokenizer is deprecated. New SGLang deployments should use --dyn-chat-processor sglang (option C) instead. See Migration from —use-sglang-tokenizer.

Which option should I pick?

  1. Does Dynamo have a parser for your model? Check the per-model tables in Tool Calling and Reasoning. If yes, use option A. This is the default path: Rust parsing on the frontend, KV-routable, lowest latency.

  2. Does the upstream engine have a parser but Dynamo doesn’t? Use option B (vLLM) or option C (SGLang). Still KV-routable.

  3. Is the tokenizer itself the problem (day-0 model, custom special tokens, rope variants)? Use option D. KV routing is off; pair with --router-mode round-robin.

  4. SGLang + day-0 model? Use option C with the appropriate upstream parser name. Do not use option E (deprecated).

Invalid and silently broken combinations

Rejected at startup

  • --dyn-chat-processor dynamo with --tool-call-parser <name> (or --reasoning-parser). The un-prefixed flags are not recognized under the Dynamo chat processor. Use --dyn-tool-call-parser on the worker instead.

  • --tool-call-parser and --dyn-tool-call-parser together on the same SGLang worker. SGLang rejects this: Cannot use both --tool-call-parser and --dyn-tool-call-parser. Pick one namespace.

  • --use-vllm-tokenizer on an SGLang worker (and vice versa). The flag must match the engine.

Silently broken (no startup error, wrong results)

  • Tokenizer delegation + --router-mode kv — Options D/E with kv routing produces prefix-hash mismatches and silent cache misses.

  • --dyn-tool-call-parser + --use-vllm-tokenizer on the same vLLM worker. The worker bypasses Dynamo’s preprocessor while the frontend-side parser is still wired up, producing mismatched token streams. No mutual-exclusivity check exists today.

Routing compatibility

--router-mode kv needs frontend tokenization to compute prefix-hash routing keys. Options A, B, and C keep the tokenizer on the frontend and are KV-routable. Options D and E move tokenization to the worker and are not KV-routable — pair them with round-robin or random.

Optionkv routinground-robin / random
A (Dynamo-native)YesYes
B (vLLM processor)YesYes
C (SGLang processor)YesYes
D (vLLM tokenizer delegation)NoYes
E (SGLang tokenizer delegation)NoYes

Why each flag exists

  • Frontend tokenization is required for KV cache routing. The frontend needs token IDs to compute prefix-hash routing keys before the request reaches a worker. Parser flags on the Rust-native path (option A) co-locate with tokenization on the frontend for this reason.

  • Backend tokenization is a fallback for when frontend tokenization can’t or shouldn’t run: unsupported model, day-0 support, tokenizer edge cases (custom special tokens, rope variants). The engine owns the tokenizer in this mode, so KV routing drops out.

  • Chat-processor swap (options B/C) is the middle ground: tokenization stays on the frontend (KV-routable), but parsing delegates to the upstream engine’s Python implementation. This covers models where Dynamo’s Rust parser hasn’t been written yet.

Parser names by model

For the full list of supported parser names, which models they cover, and upstream name divergences (relevant for options B and C):

  • Tool Calling — supported tool call parsers with model mappings and upstream name differences
  • Reasoning — supported reasoning parsers with model mappings and force-reasoning behavior

Canonical launch examples

$# A -- Dynamo-native (default).
$python -m dynamo.vllm \
> --dyn-tool-call-parser kimi_k2 \
> --dyn-reasoning-parser kimi_k25
$python -m dynamo.frontend --dyn-chat-processor dynamo
$
$# B -- vLLM chat-processor (upstream parser names on the frontend).
$python -m dynamo.vllm ...
$python -m dynamo.frontend \
> --dyn-chat-processor vllm \
> --tool-call-parser hermes \
> --reasoning-parser deepseek_r1
$
$# C -- SGLang chat-processor.
$python -m dynamo.sglang ...
$python -m dynamo.frontend \
> --dyn-chat-processor sglang \
> --tool-call-parser kimi_k2 \
> --reasoning-parser kimi_k25
$
$# D -- vLLM tokenizer delegation (no KV routing).
$python -m dynamo.vllm --use-vllm-tokenizer ...
$python -m dynamo.frontend --router-mode round-robin

See Also