SGLang Chat Processor

SGLang-native preprocessing and postprocessing for chat completions

View as Markdown

The SGLang chat processor enables SGLang-native preprocessing and postprocessing in the Dynamo frontend. It uses SGLang’s tokenizer, chat templates, tool call parser, and reasoning parser directly — bypassing the default Rust preprocessor for v1/chat/completions requests.

When to Use

Use --dyn-chat-processor sglang when Dynamo’s built-in Rust preprocessor does not yet support a tool call parser or reasoning parser you need. The SGLang processor delegates to SGLang’s Python implementations, so any parser SGLang supports works immediately.

Common cases:

  • A tool call format not yet in the Rust tool_calling library
  • A reasoning parser not yet supported natively
  • A chat template that the Rust preprocessor doesn’t handle correctly

If the parser you need is missing from the Rust preprocessor, consider opening an issue or PR to add native support — native parsers avoid the Python GIL overhead entirely.

Quick Start

$# Frontend with SGLang processor, tool calling, and reasoning
$python -m dynamo.frontend \
> --router-mode kv \
> --dyn-chat-processor sglang \
> --tool-call-parser hermes \
> --reasoning-parser qwen3
$
$# Workers (unchanged)
$CUDA_VISIBLE_DEVICES=0 python -m dynamo.sglang \
> --model-path Qwen/Qwen3-14B-FP8 \
> --served-model-name Qwen/Qwen3-14B-FP8 \
> --tp 1 --trust-remote-code \
> --kv-events-config '{"publisher":"zmq","topic":"kv-events","endpoint":"tcp://*:5557"}'

Frontend Arguments

These arguments are passed to the frontend (not the worker) when using --dyn-chat-processor sglang:

ArgumentDefaultDescription
--dyn-chat-processor sglang(none)Enable the SGLang chat processor
--tool-call-parserNoneTool call parser name (any SGLang-supported parser)
--reasoning-parserNoneReasoning parser name (any SGLang-supported parser)

Environment Variables

VariableDefaultDescription
DYN_SGLANG_STREAM_INTERVAL20Number of tokens to accumulate before detokenizing. Higher values improve throughput. The first chunk always emits immediately (interval=1) to minimize time-to-first-token.

Tool Calling

The processor supports all SGLang tool call formats. Pass --tool-call-parser on the frontend:

$python -m dynamo.frontend \
> --dyn-chat-processor sglang \
> --tool-call-parser hermes

Any parser supported by SGLang can be used. See the SGLang documentation for the full list of available tool call parsers.

Example: Tool Call Request

$curl http://localhost:8000/v1/chat/completions \
> -H "Content-Type: application/json" \
> -d '{
> "model": "Qwen/Qwen3-14B-FP8",
> "messages": [{"role": "user", "content": "What is the weather in Paris?"}],
> "tools": [{
> "type": "function",
> "function": {
> "name": "get_weather",
> "description": "Get weather for a city",
> "parameters": {
> "type": "object",
> "properties": {"city": {"type": "string"}},
> "required": ["city"]
> }
> }
> }],
> "tool_choice": "auto"
> }'

Response:

1{
2 "choices": [{
3 "message": {
4 "role": "assistant",
5 "tool_calls": [{
6 "id": "call_8cd24396f3671048",
7 "type": "function",
8 "function": {
9 "name": "get_weather",
10 "arguments": "{\"city\": \"Paris\"}"
11 }
12 }],
13 "reasoning_content": "The user wants weather info for Paris..."
14 },
15 "finish_reason": "tool_calls"
16 }]
17}

Reasoning Parsing

For models that produce chain-of-thought reasoning (e.g., Qwen3, DeepSeek-R1), pass --reasoning-parser:

$python -m dynamo.frontend \
> --dyn-chat-processor sglang \
> --reasoning-parser qwen3

The parser separates think tag content into the reasoning_content field and regular content into the content field.

Migration from --use-sglang-tokenizer

--use-sglang-tokenizer on the worker is deprecated. Replace with --dyn-chat-processor sglang on the frontend:

1 # Before (deprecated)
2- python -m dynamo.sglang --model-path <model> --use-sglang-tokenizer
3- python -m dynamo.frontend
4
5 # After
6 python -m dynamo.sglang --model-path <model>
7+ python -m dynamo.frontend --dyn-chat-processor sglang

Key differences:

--use-sglang-tokenizer--dyn-chat-processor sglang
LocationWorker flagFrontend flag
KV routerNot supportedSupported
Tool callingNot supportedSupported
ReasoningNot supportedSupported
Endpointsv1/chat/completions onlyv1/chat/completions only

See Also