For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
  • Kubernetes Deployment
    • Deployment Guide
  • User Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Dynamo Benchmarking
    • Multimodal
    • Diffusion (Preview)
    • Tool Calling
    • LoRA Adapters
    • Agents
    • Observability (Local)
    • Fault Tolerance
    • Writing Python Workers
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
      • Discovery Plane
      • Request Plane
      • Event Plane
    • Blog
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Discovery Backends
  • Kubernetes Discovery
  • How It Works
  • Benefits
  • Environment Variables (Injected by Operator)
  • etcd Discovery (Default)
  • Connection Configuration
  • Service Registration
  • Lease-Based Cleanup
  • KV Store
  • Operational Guidance
  • Use Kubernetes Discovery on K8s
  • Deploy an etcd Cluster for Bare Metal
  • Tune Lease TTLs
  • Related Documentation
Design DocsCommunication Planes

Discovery Plane

||View as Markdown|
Edit this page
Previous

Distributed Runtime

Next

Request Plane

Dynamo’s service discovery layer lets components find each other at runtime. Workers register their endpoints when they start, and frontends discover them automatically. The discovery backend adapts to the deployment environment.

Discovery plane architecture showing Kubernetes and etcd backends

Discovery Backends

DeploymentDiscovery BackendConfiguration
Kubernetes (with Dynamo operator)Native K8s (CRDs, EndpointSlices)Operator sets DYN_DISCOVERY_BACKEND=kubernetes
Bare metal / Local (default)etcdETCD_ENDPOINTS (defaults to http://localhost:2379)

Note: The runtime always defaults to etcd. Kubernetes discovery must be explicitly enabled — the Dynamo operator handles this automatically.

Kubernetes Discovery

When running on Kubernetes with the Dynamo operator, service discovery uses native Kubernetes resources instead of etcd.

How It Works

  1. Workers register their endpoints by creating DynamoWorkerMetadata custom resources.
  2. EndpointSlices signal pod readiness to the system.
  3. Components watch for CRD changes to discover available workers.

Benefits

  • No external etcd cluster required.
  • Native integration with Kubernetes pod lifecycle.
  • Automatic cleanup when pods terminate.
  • Works with standard Kubernetes RBAC.

Environment Variables (Injected by Operator)

VariableDescription
DYN_DISCOVERY_BACKENDSet to kubernetes
POD_NAMECurrent pod name
POD_NAMESPACECurrent namespace
POD_UIDPod unique identifier

etcd Discovery (Default)

When DYN_DISCOVERY_BACKEND is not set (or set to etcd), etcd is used for service discovery.

Connection Configuration

VariableDescriptionDefault
ETCD_ENDPOINTSComma-separated etcd URLshttp://localhost:2379
ETCD_AUTH_USERNAMEBasic auth usernameNone
ETCD_AUTH_PASSWORDBasic auth passwordNone
ETCD_AUTH_CACA certificate path (TLS)None
ETCD_AUTH_CLIENT_CERTClient certificate pathNone
ETCD_AUTH_CLIENT_KEYClient key pathNone

Example:

$export ETCD_ENDPOINTS=http://etcd-0:2379,http://etcd-1:2379,http://etcd-2:2379

Service Registration

Workers register their endpoints in etcd with a key hierarchy:

/services/{namespace}/{component}/{endpoint}/{instance_id}

For example:

/services/vllm-agg/backend/generate/694d98147d54be25

Frontends and routers discover available workers by watching the relevant prefix and receiving real-time updates when workers join or leave.

Lease-Based Cleanup

Each runtime maintains a lease with etcd (default TTL: 10 seconds). If a worker crashes or loses connectivity:

Lease lifecycle showing DistributedRuntime keep-alive heartbeat to etcd

  1. Keep-alive heartbeats stop.
  2. The lease expires after the TTL.
  3. All registered endpoints are automatically deleted.
  4. Clients receive removal events and reroute traffic to healthy workers.

This ensures stale endpoints are cleaned up without manual intervention.

KV Store

Dynamo provides a KV store abstraction for storing metadata (endpoint instances, model deployment cards, event channels). Multiple backends are supported:

BackendUse Case
etcdProduction deployments
MemoryTesting and development
NATSNATS-only deployments
FileLocal persistence

Operational Guidance

Use Kubernetes Discovery on K8s

The Dynamo operator automatically sets DYN_DISCOVERY_BACKEND=kubernetes for pods. No additional setup required.

Deploy an etcd Cluster for Bare Metal

For bare-metal production deployments, deploy a 3-node etcd cluster for high availability.

Tune Lease TTLs

Balance between failure detection speed and overhead:

  • Short TTL (5s) — Faster failure detection, more keep-alive traffic.
  • Long TTL (30s) — Less overhead, slower detection.

The default (10s) is a reasonable starting point for most deployments.

Related Documentation

  • Event Plane — Pub/sub for KV cache events and worker metrics
  • Distributed Runtime — Runtime architecture
  • Request Plane — Request transport configuration
  • Fault Tolerance — Failure handling