Topology-Aware KV Transfer

Keep disaggregated prefill and decode KV-cache transfers within a selected topology domain

Topology-Aware KV Transfer

Topology-aware KV transfer lets a disaggregated Dynamo deployment route decode requests toward workers that share the selected prefill worker’s topology domain, such as zone or rack. This reduces slow cross-domain KV-cache transfers when prefill and decode workers exchange KV data over NIXL.

Use this feature when:

Your deployment uses separate prefill and decode workers.
Your cluster exposes useful node labels, such as topology.kubernetes.io/zone or a rack/block label.
Same-domain KV transfer is required for correctness or strongly preferred for latency and bandwidth.

This page covers the Kubernetes operator path. For router and runtime behavior, see Router Topology-Aware KV Transfer. For RDMA/NIXL transport setup, see Disagg Communication.

How It Works

The operator configures worker pods from spec.experimental.kvTransferPolicy:

Adds a nvidia.com/topology-label-key annotation to worker pods.
Runs a topology-label controller that copies the configured node label onto the worker pod after scheduling.
Projects that pod label into /etc/dynamo/topology/<domain> with a Downward API volume.
Injects worker environment variables that tell the backend runtime which topology domain and enforcement policy to publish.

The frontend does not read this policy from its own environment. Workers publish the topology metadata in their ModelRuntimeConfig; the router reads it from runtime discovery.

Prerequisites

Requirement	Details
Disaggregated serving	Separate prefill and decode worker services.
KV router	The frontend should use `DYN_ROUTER_MODE=kv`.
Node topology labels	Every node that can host a worker must carry the configured `labelKey`.
Dynamo operator	The operator must include topology-label controller and node-read RBAC.
KV transfer transport	RDMA, EFA, or another NIXL-compatible transport should already be configured for production disaggregated deployments.

Confirm that the label you plan to use exists on worker nodes:

$ kubectl get nodes -L topology.kubernetes.io/zone

Required Same-Domain Routing

enforcement: required constrains decode worker selection to workers whose topology value matches the selected prefill worker for the configured domain. If no decode worker satisfies the generated constraint, the router fails the request instead of silently crossing the domain.

1 apiVersion: nvidia.com/v1beta1
2 kind: DynamoGraphDeployment
3 metadata:
4   name: qwen3-disagg-zone
5 spec:
6   experimental:
7     kvTransferPolicy:
8       labelKey: topology.kubernetes.io/zone
9       domain: zone
10       enforcement: required
11   components:
12   - name: Frontend
13     type: frontend
14     replicas: 1
15     podTemplate:
16       spec:
17         containers:
18         - name: main
19           image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.1
20           env:
21           - name: DYN_ROUTER_MODE
22             value: kv
23   - name: VllmPrefillWorker
24     type: worker
25     replicas: 2
26     podTemplate:
27       spec:
28         containers:
29         - name: main
30           image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.1
31           command: ["python3", "-m", "dynamo.vllm"]
32           args: ["--model", "Qwen/Qwen3-0.6B", "--disaggregation-mode", "prefill"]
33           envFrom:
34           - secretRef:
35               name: hf-token-secret
36           resources:
37             limits:
38               nvidia.com/gpu: "1"
39   - name: VllmDecodeWorker
40     type: worker
41     replicas: 2
42     podTemplate:
43       spec:
44         containers:
45         - name: main
46           image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.1
47           command: ["python3", "-m", "dynamo.vllm"]
48           args: ["--model", "Qwen/Qwen3-0.6B", "--disaggregation-mode", "decode"]
49           envFrom:
50           - secretRef:
51               name: hf-token-secret
52           resources:
53             limits:
54               nvidia.com/gpu: "1"

enforcement defaults to required when omitted.

required is a decode-routing constraint, not a capacity planner. The DynamoGraphDeployment author or cluster administrator must ensure that every topology domain that can receive prefill workers also has sufficient same-domain decode capacity. If a domain has prefill workers but no matching decode workers, or too little decode capacity, the router cannot spill to another domain without violating the policy.

Capacity Planning Across Domains

Plan prefill and decode capacity per topology domain before enabling enforcement: required. For example, assume:

Two availability zones: az-1 and az-2.
The target fleet is 60 prefill workers and 120 decode workers.
The fleet should be split evenly across the two zones.
The target prefill-to-decode ratio is 1:2 in each zone.

That means each zone should run 30 prefill workers and 60 decode workers:

Zone	Prefill workers	Decode workers	Ratio
`az-1`	30	60	1:2
`az-2`	30	60	1:2

In a DynamoGraphDeployment, express this as separate prefill and decode components per zone. Pin each component to its zone and set kvTransferPolicy.enforcement to required so the router refuses cross-zone decode selection. The DGD author or cluster administrator must ensure each zone has enough schedulable capacity for its pinned replicas. Worker command and args are omitted here; configure each worker for prefill or decode mode as in the base disaggregated serving manifest:

1 apiVersion: nvidia.com/v1beta1
2 kind: DynamoGraphDeployment
3 metadata:
4   name: qwen3-disagg-zone-capacity
5 spec:
6   experimental:
7     kvTransferPolicy:
8       labelKey: topology.kubernetes.io/zone
9       domain: zone
10       enforcement: required
11   components:
12   - name: Frontend
13     type: frontend
14     replicas: 1
15     podTemplate:
16       spec:
17         containers:
18         - name: main
19           image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.1
20           env:
21           - name: DYN_ROUTER_MODE
22             value: kv
23   - name: VllmPrefillWorkerAz1
24     type: worker
25     replicas: 30
26     podTemplate:
27       spec:
28         affinity:
29           nodeAffinity:
30             requiredDuringSchedulingIgnoredDuringExecution:
31               nodeSelectorTerms:
32               - matchExpressions:
33                 - key: topology.kubernetes.io/zone
34                   operator: In
35                   values: ["az-1"]
36         containers:
37         - name: main
38           image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.1
39           envFrom:
40           - secretRef:
41               name: hf-token-secret
42   - name: VllmDecodeWorkerAz1
43     type: worker
44     replicas: 60
45     podTemplate:
46       spec:
47         affinity:
48           nodeAffinity:
49             requiredDuringSchedulingIgnoredDuringExecution:
50               nodeSelectorTerms:
51               - matchExpressions:
52                 - key: topology.kubernetes.io/zone
53                   operator: In
54                   values: ["az-1"]
55         containers:
56         - name: main
57           image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.1
58           envFrom:
59           - secretRef:
60               name: hf-token-secret
61   - name: VllmPrefillWorkerAz2
62     type: worker
63     replicas: 30
64     podTemplate:
65       spec:
66         affinity:
67           nodeAffinity:
68             requiredDuringSchedulingIgnoredDuringExecution:
69               nodeSelectorTerms:
70               - matchExpressions:
71                 - key: topology.kubernetes.io/zone
72                   operator: In
73                   values: ["az-2"]
74         containers:
75         - name: main
76           image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.1
77           envFrom:
78           - secretRef:
79               name: hf-token-secret
80   - name: VllmDecodeWorkerAz2
81     type: worker
82     replicas: 60
83     podTemplate:
84       spec:
85         affinity:
86           nodeAffinity:
87             requiredDuringSchedulingIgnoredDuringExecution:
88               nodeSelectorTerms:
89               - matchExpressions:
90                 - key: topology.kubernetes.io/zone
91                   operator: In
92                   values: ["az-2"]
93         containers:
94         - name: main
95           image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.1
96           envFrom:
97           - secretRef:
98               name: hf-token-secret

Preferred Same-Domain Routing

enforcement: preferred keeps all decode workers eligible but biases worker selection toward the same topology domain.

1 spec:
2   experimental:
3     kvTransferPolicy:
4       labelKey: topology.kubernetes.io/zone
5       domain: zone
6       enforcement: preferred
7       preferredWeight: 0.85

preferredWeight is required with enforcement: preferred. It must be between 0 and 1. A higher value creates a stronger same-domain preference, but it is not a probability and does not guarantee same-domain selection.

Field Reference

Field	Required	Description
`labelKey`	Yes	Kubernetes node label key to copy onto worker pods, for example `topology.kubernetes.io/zone`.
`domain`	Yes	Logical topology domain name published by workers, for example `zone` or `rack`. Must match `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$`.
`enforcement`	No	`required` or `preferred`. Defaults to `required`.
`preferredWeight`	Only with `preferred`	Bias weight from `0` to `1`; only valid with `enforcement: preferred`.

The runtime uses domain, not the Kubernetes label key, when creating routing constraints. For example, labelKey: topology.kubernetes.io/zone and domain: zone produce worker topology metadata like:

1 {
2   "topology_domains": {
3     "zone": "us-east-1a"
4   },
5   "kv_transfer_domain": "zone",
6   "kv_transfer_enforcement": "required"
7 }

Verify the Deployment

After the DGD creates worker pods, verify the operator pipeline from node label to runtime topology file.

$ export NAMESPACE=<namespace>
$ export POD=<worker-pod>
$ 
$ kubectl get pod "$POD" -n "$NAMESPACE" \
>   -o jsonpath='{.metadata.annotations.nvidia\.com/topology-label-key}{"\n"}'
$ 
$ kubectl get pod "$POD" -n "$NAMESPACE" \
>   -o jsonpath='{.metadata.labels.topology\.kubernetes\.io/zone}{"\n"}'
$ 
$ kubectl exec "$POD" -n "$NAMESPACE" -- \
>   sh -c 'find /etc/dynamo/topology -maxdepth 1 -type f -print -exec cat {} \;'

Expected results:

The annotation value is the configured labelKey.
The worker pod has the copied topology label.
/etc/dynamo/topology/<domain> exists and contains the topology value.

Worker logs should include topology config during startup:

$ kubectl logs "$POD" -n "$NAMESPACE" | grep -i "Topology config"

Troubleshooting

Pod Has No Copied Topology Label

Check whether the node has the configured label:

$ NODE=$(kubectl get pod "$POD" -n "$NAMESPACE" -o jsonpath='{.spec.nodeName}')
$ kubectl get node "$NODE" -o jsonpath='{.metadata.labels.topology\.kubernetes\.io/zone}{"\n"}'

If the label is missing, the topology-label controller emits a warning event with reason TopologyLabelMissing and leaves topology metadata unavailable for that worker.

$ kubectl get events -n "$NAMESPACE" \
>   --field-selector involvedObject.name="$POD",reason=TopologyLabelMissing

Worker Exits While Waiting for Topology

When topology is enabled, the worker waits for the transfer-domain file to appear and contain data. If it stays empty, check:

spec.experimental.kvTransferPolicy.domain matches the projected file name.
spec.experimental.kvTransferPolicy.labelKey exists on the worker’s node.
The worker pod has the nvidia.com/topology-label-key annotation.
The topology-label controller is running and has node get RBAC.

Required Policy Fails Requests

With enforcement: required, decode routing fails if no decode worker has the same generated topology taint as the selected prefill worker. Verify both prefill and decode workers publish the same domain, and that each domain where prefill workers can be selected has enough matching decode workers for the expected p/d ratio.

Use preferred while validating a heterogeneous rollout if cross-domain routing is acceptable during partial capacity.

Relationship to Topology Aware Scheduling

Topology Aware Scheduling controls where Kubernetes places pods. Topology-aware KV transfer controls how Dynamo routes between already-running prefill and decode workers.

Use them together when possible:

Topology Aware Scheduling keeps workers placed inside useful topology boundaries.
Topology-aware KV transfer prevents the router from choosing a decode worker outside the selected prefill worker’s transfer domain.

1	apiVersion: nvidia.com/v1beta1
2	kind: DynamoGraphDeployment
3	metadata:
4	name: qwen3-disagg-zone
5	spec:
6	experimental:
7	kvTransferPolicy:
8	labelKey: topology.kubernetes.io/zone
9	domain: zone
10	enforcement: required
11	components:
12	- name: Frontend
13	type: frontend
14	replicas: 1
15	podTemplate:
16	spec:
17	containers:
18	- name: main
19	image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.1
20	env:
21	- name: DYN_ROUTER_MODE
22	value: kv
23	- name: VllmPrefillWorker
24	type: worker
25	replicas: 2
26	podTemplate:
27	spec:
28	containers:
29	- name: main
30	image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.1
31	command: ["python3", "-m", "dynamo.vllm"]
32	args: ["--model", "Qwen/Qwen3-0.6B", "--disaggregation-mode", "prefill"]
33	envFrom:
34	- secretRef:
35	name: hf-token-secret
36	resources:
37	limits:
38	nvidia.com/gpu: "1"
39	- name: VllmDecodeWorker
40	type: worker
41	replicas: 2
42	podTemplate:
43	spec:
44	containers:
45	- name: main
46	image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.1
47	command: ["python3", "-m", "dynamo.vllm"]
48	args: ["--model", "Qwen/Qwen3-0.6B", "--disaggregation-mode", "decode"]
49	envFrom:
50	- secretRef:
51	name: hf-token-secret
52	resources:
53	limits:
54	nvidia.com/gpu: "1"

1	{
2	"topology_domains": {
3	"zone": "us-east-1a"
4	},
5	"kv_transfer_domain": "zone",
6	"kv_transfer_enforcement": "required"
7	}

$	export NAMESPACE=<namespace>
$	export POD=<worker-pod>
$
$	kubectl get pod "$POD" -n "$NAMESPACE" \
>	-o jsonpath='{.metadata.annotations.nvidia\.com/topology-label-key}{"\n"}'
$
$	kubectl get pod "$POD" -n "$NAMESPACE" \
>	-o jsonpath='{.metadata.labels.topology\.kubernetes\.io/zone}{"\n"}'
$
$	kubectl exec "$POD" -n "$NAMESPACE" -- \
>	sh -c 'find /etc/dynamo/topology -maxdepth 1 -type f -print -exec cat {} \;'

$	NODE=$(kubectl get pod "$POD" -n "$NAMESPACE" -o jsonpath='{.spec.nodeName}')
$	kubectl get node "$NODE" -o jsonpath='{.metadata.labels.topology\.kubernetes\.io/zone}{"\n"}'

$	kubectl get events -n "$NAMESPACE" \
>	--field-selector involvedObject.name="$POD",reason=TopologyLabelMissing