DP Rank Routing (Attention Data Parallelism)
DP Rank Routing (Attention Data Parallelism)
DP Rank Routing (Attention Data Parallelism)
For general TensorRT-LLM features and configuration, see the Reference Guide.
TensorRT-LLM supports attention data parallelism (attention DP) for models like DeepSeek. When enabled, multiple attention DP ranks run within a single worker, each with its own KV cache. Dynamo can route requests to specific DP ranks based on KV cache state.
attention_dp_relax=False). Use this with --router-mode kv for cache-aware routing.--router-mode round-robin or random when KV-aware routing isn’t needed.The --enable-attention-dp flag sets attention_dp_size = tensor_parallel_size and configures Dynamo to publish KV events per DP rank. The router automatically creates routing targets for each (worker_id, dp_rank) combination.
Attention DP requires TRT-LLM’s PyTorch backend. AutoDeploy does not support attention DP.