For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • DynoSim: Simulating the Pareto Frontier
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
  • User Guides
    • Disaggregated Serving
    • KV Cache Aware Routing
    • KV Cache Offloading
    • Tool Calling
    • Reasoning
    • Agents
    • Multimodal
    • Diffusion
    • LoRA Adapters
    • Fastokens Tokenizer
    • Observability (Local)
    • Fault Tolerance
    • Benchmarking
    • Writing Python Workers
    • Writing Python Unified Backends
    • Writing Rust Unified Backends
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
      • Router Guide
      • Routing Concepts
      • Configuration and Tuning
      • Disaggregated Serving
      • Topology-Aware KV Transfer
      • Router Operations
      • Router Examples
      • Standalone Indexer
      • KV Event Replay — Dynamo vs vLLM
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Topology-Aware KV Transfer
  • Runtime Contract
  • Request Flow
  • Enforcement Modes
  • Required
  • Preferred
  • Worker Environment Contract
  • Backend Support
  • Interactions with Existing Routing Constraints
  • Operational Notes
  • Troubleshooting Signals
ComponentsRouter

Topology-Aware KV Transfer

Runtime metadata and decode routing semantics for topology-aware prefill/decode handoff

||View as Markdown|
Previous

Disaggregated Serving

Next

Router Operations

Topology-Aware KV Transfer

Topology-aware KV transfer constrains or biases decode worker selection after a prefill worker has been selected. The router derives standard RoutingConstraints from the selected prefill worker’s published topology metadata, then merges those constraints into the decode request.

Use the Kubernetes operator path when possible. For deployment examples, see Kubernetes Topology-Aware KV Transfer.

Runtime Contract

Workers publish topology and policy fields through ModelRuntimeConfig:

FieldMeaning
topology_domainsMap of logical domain name to this worker’s topology value, for example {"zone": "us-east-1a"}.
kv_transfer_domainDomain key used for prefill-to-decode KV transfer routing, for example zone.
kv_transfer_enforcementrequired or preferred.
kv_transfer_preferred_weightPreferred-taint weight used only when enforcement is preferred.

Each topology entry also becomes a canonical worker taint:

dynamo.topology/<domain>=<value>

For example:

1{
2 "topology_domains": {
3 "zone": "us-east-1a",
4 "rack": "rack-22"
5 },
6 "kv_transfer_domain": "zone",
7 "kv_transfer_enforcement": "preferred",
8 "kv_transfer_preferred_weight": 0.85
9}

This creates worker taints:

dynamo.topology/zone=us-east-1a
dynamo.topology/rack=rack-22

The KV-transfer policy uses only kv_transfer_domain to derive the decode constraint. Other topology domains remain available as ordinary routing taints.

Request Flow

The prefill router builds the decode constraint before dispatching prefill when the selected worker is already known. This keeps required policy fail-closed: if the router cannot derive authoritative decode constraints for a required policy, it fails the request instead of dispatching prefill and then discovering that decode cannot be routed safely.

Enforcement Modes

Required

required turns the selected prefill worker’s transfer-domain topology into a required taint.

required_taints = {"dynamo.topology/zone=us-east-1a"}

Decode workers without that taint are ineligible. If no eligible decode worker exists, routing returns no endpoint for that request.

Preferred

preferred turns the same topology into a preferred taint.

preferred_taints = {"dynamo.topology/zone=us-east-1a": 0.85}

All decode workers remain eligible, but matching workers receive a lower routing cost. preferredWeight controls the strength of the preference from 0 to 1.

Worker Environment Contract

The Python backend utility reads topology from files and transfer policy from environment variables:

Environment variableDescription
DYN_TOPOLOGY_ENABLEDSet to true to enable topology reading.
DYN_TOPOLOGY_MOUNT_PATHDirectory containing topology files. Defaults to /etc/dynamo/topology.
DYN_KV_TRANSFER_DOMAINRequired when topology is enabled. Names the topology file and runtime domain to use for KV transfer constraints.
DYN_KV_TRANSFER_ENFORCEMENTrequired or preferred. Defaults to required when a domain is set.
DYN_KV_TRANSFER_PREFERRED_WEIGHTWeight used when enforcement is preferred.

Each non-hidden, non-empty file under DYN_TOPOLOGY_MOUNT_PATH is interpreted as one topology domain. The file name is the domain; the file content is the worker’s value for that domain.

For example:

$mkdir -p /tmp/dynamo-topology
$printf 'us-east-1a\n' > /tmp/dynamo-topology/zone
$
$export DYN_TOPOLOGY_ENABLED=true
$export DYN_TOPOLOGY_MOUNT_PATH=/tmp/dynamo-topology
$export DYN_KV_TRANSFER_DOMAIN=zone
$export DYN_KV_TRANSFER_ENFORCEMENT=required

When topology is enabled, the worker polls until the selected transfer-domain file exists and has content. If it remains missing or empty through the timeout window, the worker exits so the bad topology source is visible during startup.

Backend Support

The integrated Python backends apply the topology config during worker registration:

  • vLLM
  • SGLang
  • TensorRT-LLM

The topology utility writes the fields onto ModelRuntimeConfig; Rust owns validation and canonical topology-taint generation.

Interactions with Existing Routing Constraints

Topology-aware KV transfer uses the existing RoutingConstraints path. It does not add a topology-specific selector. If a request already has routing constraints, the prefill router merges the generated topology constraints into the decode request:

  • Required topology taints are appended to existing required_taints.
  • Preferred topology taints are appended to existing preferred_taints.

User-provided constraints still apply. A decode worker must satisfy all required constraints to be eligible.

Operational Notes

  • Configure this only for disaggregated prefill/decode deployments. Aggregated workers do not perform a remote prefill-to-decode KV transfer.
  • Keep DYN_ROUTER_MODE=kv on the frontend so the prefill and decode routing paths use the KV router.
  • Make sure every prefill domain has enough decode capacity when using required; otherwise the router can legitimately fail requests in domains without decode workers.
  • Use preferred during incremental rollouts when same-domain transfer is a latency preference rather than a hard placement requirement.
  • Transport health is separate from topology selection. Topology-aware routing chooses a better peer, but RDMA, EFA, UCX, or libfabric still need to be configured correctly for NIXL KV transfer.

Troubleshooting Signals

SymptomLikely causeCheck
Worker exits during startupDYN_KV_TRANSFER_DOMAIN missing, or topology file never populated.Worker logs and contents of DYN_TOPOLOGY_MOUNT_PATH.
Required policy returns no endpointNo decode worker has the selected prefill worker’s generated topology taint.Worker ModelRuntimeConfig topology metadata and decode worker placement.
Preferred policy still routes cross-domainMatching domain is overloaded or unavailable, or weight is too low relative to load.Increase preferredWeight, add same-domain decode capacity, or switch to required.
Router sees no topology metadataWorker did not publish topology fields.Backend startup logs and runtime config metrics/discovery data.

For Kubernetes-specific verification commands, see Verify the Deployment.