Dynamo supports disaggregated serving where prefill (prompt processing) and decode (token generation) are handled by separate worker pools. When you register workers with ModelType.Prefill, the frontend automatically detects them and activates an internal prefill router.
For the high-level deployment matrix, see Router Guide. For the router flags used in this setup, see Configuration and Tuning.
The prefill router is automatically created when:
register_model() with ModelType.Chat | ModelType.Completions.ModelType.Prefill.Key characteristics of the prefill router:
track_active_blocks=false) since prefill workers do not perform decode.Key characteristics of the decode routing stage in disaggregated mode:
overlap_score_credit=0) because decode routing should not chase prefix reuse.assume_kv_reuse=false) unless the backend can truly deduplicate transferred blocks.track_prefill_tokens=false) so decode-side load reflects decode work rather than already-completed prompt work.When both workers are registered, requests are automatically routed.
The automatic disaggregated routing setup described here is currently supported by the integrated dynamo.frontend path. It is not provided as a single turnkey mode by the standalone Python router (python -m dynamo.router). If you build this topology with standalone routers, you must launch and connect the prefill and decode routing stages yourself and handle request handoff, including the disaggregated_params returned by prefill. For an advanced reference, see the Global Router, which composes local prefill and decode router pools explicitly.
The following diagram shows an overview of the major components in disaggregated serving: