Topology-Aware KV Transfer
Topology-Aware KV Transfer
Keep disaggregated prefill and decode KV-cache transfers within a selected topology domain
Topology-Aware KV Transfer
Keep disaggregated prefill and decode KV-cache transfers within a selected topology domain
Topology-aware KV transfer lets a disaggregated Dynamo deployment route decode requests toward workers that share the selected prefill worker’s topology domain, such as zone or rack. This reduces slow cross-domain KV-cache transfers when prefill and decode workers exchange KV data over NIXL.
Use this feature when:
topology.kubernetes.io/zone or a rack/block label.This page covers the Kubernetes operator path. For router and runtime behavior, see Router Topology-Aware KV Transfer. For RDMA/NIXL transport setup, see Disagg Communication.
The operator configures worker pods from spec.experimental.kvTransferPolicy:
nvidia.com/topology-label-key annotation to worker pods./etc/dynamo/topology/<domain> with a Downward API volume.The frontend does not read this policy from its own environment. Workers publish the topology metadata in their ModelRuntimeConfig; the router reads it from runtime discovery.
Confirm that the label you plan to use exists on worker nodes:
enforcement: required constrains decode worker selection to workers whose topology value matches the selected prefill worker for the configured domain. If no decode worker satisfies the generated constraint, the router fails the request instead of silently crossing the domain.
enforcement defaults to required when omitted.
required is a decode-routing constraint, not a capacity planner. The DynamoGraphDeployment author or cluster administrator must ensure that every topology domain that can receive prefill workers also has sufficient same-domain decode capacity. If a domain has prefill workers but no matching decode workers, or too little decode capacity, the router cannot spill to another domain without violating the policy.
Plan prefill and decode capacity per topology domain before enabling enforcement: required. For example, assume:
az-1 and az-2.That means each zone should run 30 prefill workers and 60 decode workers:
In a DynamoGraphDeployment, express this as separate prefill and decode components per zone. Pin each component to its zone and set kvTransferPolicy.enforcement to required so the router refuses cross-zone decode selection. The DGD author or cluster administrator must ensure each zone has enough schedulable capacity for its pinned replicas. Worker command and args are omitted here; configure each worker for prefill or decode mode as in the base disaggregated serving manifest:
enforcement: preferred keeps all decode workers eligible but biases worker selection toward the same topology domain.
preferredWeight is required with enforcement: preferred. It must be between 0 and 1. A higher value creates a stronger same-domain preference, but it is not a probability and does not guarantee same-domain selection.
The runtime uses domain, not the Kubernetes label key, when creating routing constraints. For example, labelKey: topology.kubernetes.io/zone and domain: zone produce worker topology metadata like:
After the DGD creates worker pods, verify the operator pipeline from node label to runtime topology file.
Expected results:
labelKey./etc/dynamo/topology/<domain> exists and contains the topology value.Worker logs should include topology config during startup:
Check whether the node has the configured label:
If the label is missing, the topology-label controller emits a warning event with reason TopologyLabelMissing and leaves topology metadata unavailable for that worker.
When topology is enabled, the worker waits for the transfer-domain file to appear and contain data. If it stays empty, check:
spec.experimental.kvTransferPolicy.domain matches the projected file name.spec.experimental.kvTransferPolicy.labelKey exists on the worker’s node.nvidia.com/topology-label-key annotation.get RBAC.With enforcement: required, decode routing fails if no decode worker has the same generated topology taint as the selected prefill worker. Verify both prefill and decode workers publish the same domain, and that each domain where prefill workers can be selected has enough matching decode workers for the expected p/d ratio.
Use preferred while validating a heterogeneous rollout if cross-domain routing is acceptable during partial capacity.
Topology Aware Scheduling controls where Kubernetes places pods. Topology-aware KV transfer controls how Dynamo routes between already-running prefill and decode workers.
Use them together when possible: