For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
  • Kubernetes Deployment
    • Deployment Guide
      • Detailed Installation Guide
      • Deploying Your First Model
      • Dynamo Operator
      • Service Discovery
      • Webhooks
      • Minikube Setup
      • Managing Models with DynamoModel
      • Autoscaling
      • Rolling Update
      • Inference Gateway (GAIE)
      • Snapshot
  • User Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Dynamo Benchmarking
    • Multimodal
    • Diffusion (Preview)
    • Tool Calling
    • LoRA Adapters
    • Agents
    • Observability (Local)
    • Fault Tolerance
    • Writing Python Workers
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
    • Blog
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Discovery Backends
  • Kubernetes Discovery (Default)
  • Implementation Details
  • DynamoWorkerMetadata CRD
  • EndpointSlices
  • Readiness Probes
  • RBAC
  • Environment Variables
  • KV Store Discovery (etcd)
Kubernetes DeploymentDeployment Guide

Service Discovery

||View as Markdown|
Edit this page
Previous

Dynamo Operator

Next

Webhooks

Dynamo components (frontends, workers, planner) need to be able to discover each other and their capabilities at runtime. We refer to this as service discovery. There are 2 kinds of service discovery backends supported on Kubernetes.

Discovery Backends

BackendDefaultDependenciesUse Case
Kubernetes✅ YesNone (native K8s)Recommended for all Kubernetes deployments
KV Store (etcd)Noetcd clusterLegacy deployments

Kubernetes Discovery (Default)

Kubernetes discovery is the default and recommended backend when running on Kubernetes. It uses native Kubernetes primitives to facilitate discovery of components:

  • DynamoWorkerMetadata CRD: Each worker stores its registered endpoints and model cards in a Custom Resource
  • EndpointSlices: EndpointSlices signal each component’s readiness status

Implementation Details

Each pod runs a discovery daemon that watches both EndpointSlices and DynamoWorkerMetadata CRs. A pod is only discoverable when it appears as “ready” in an EndpointSlice AND has a corresponding DynamoWorkerMetadata CR. This correlation ensures pods aren’t discoverable until they’re ready, metadata is immediately available, and stale entries are cleaned up when pods terminate.

DynamoWorkerMetadata CRD

Each worker pod creates a DynamoWorkerMetadata CR that stores its discovery metadata:

1apiVersion: nvidia.com/v1alpha1
2kind: DynamoWorkerMetadata
3metadata:
4 name: my-worker-pod-abc123
5 namespace: dynamo-system
6 ownerReferences:
7 - apiVersion: v1
8 kind: Pod
9 name: my-worker-pod-abc123
10 uid: <pod-uid>
11 controller: true
12spec:
13 data:
14 endpoints:
15 "dynamo/backend/generate":
16 type: Endpoint
17 namespace: dynamo
18 component: backend
19 endpoint: generate
20 instance_id: 12345678901234567890
21 transport:
22 nats_tcp: "dynamo_backend.generate-abc123"
23 model_cards: {}

The CR is named after the pod and includes an owner reference for automatic garbage collection when the pod is deleted.

EndpointSlices

While DynamoWorkerMetadata resources provide an up-to-date snapshot of a component’s capabilities, EndpointSlices give a snapshot of health of the various Dynamo components.

The operator creates a Kubernetes Service targeting the Dynamo components. The Kubernetes controller in turn creates and maintains EndpointSlice resources that keep track of the readiness of the pods targeted by the Service. Watching these slices gives us an up-to-date snapshot of which Dynamo components are ready to serve traffic.

Readiness Probes

A pod is marked ready if the readiness probe succeeds. On Dynamo workers, this is when the generate endpoint is available and healthy. These probes are configured by the Dynamo operator for each pod/component.

RBAC

Each Dynamo component pod is automatically given a ServiceAccount that allows it to watch EndpointSlice and DynamoWorkerMetadata resources within its namespace.

Environment Variables

The following environment variables are automatically injected into pods by the operator to facilitate service discovery:

VariableDescription
DYN_DISCOVERY_BACKENDSet to kubernetes
POD_NAMEPod name (via downward API)
POD_NAMESPACEPod namespace (via downward API)
POD_UIDPod UID (via downward API)

The pod’s instance ID is deterministically generated by hashing the pod name, ensuring consistent identity and correlation between EndpointSlices and CRs.

KV Store Discovery (etcd)

To use etcd-based discovery instead of Kubernetes-native discovery, add the annotation to your DynamoGraphDeployment:

1apiVersion: nvidia.com/v1alpha1
2kind: DynamoGraphDeployment
3metadata:
4 name: my-deployment
5 annotations:
6 nvidia.com/dynamo-discovery-backend: etcd
7spec:
8 services:
9 # ...

This requires an etcd cluster to be available. The etcd connection is configured via the platform Helm chart.