Dynamo’s service discovery layer lets components find each other at runtime. Workers register their endpoints when they start, and frontends discover them automatically. The discovery backend adapts to the deployment environment.
Note: The runtime always defaults to etcd. Kubernetes discovery must be explicitly enabled — the Dynamo operator handles this automatically.
When running on Kubernetes with the Dynamo operator, service discovery uses native Kubernetes resources instead of etcd.
When DYN_DISCOVERY_BACKEND is not set (or set to etcd), etcd is used for service discovery.
Example:
Workers register their endpoints in etcd with a key hierarchy:
For example:
Frontends and routers discover available workers by watching the relevant prefix and receiving real-time updates when workers join or leave.
Each runtime maintains a lease with etcd (default TTL: 10 seconds). If a worker crashes or loses connectivity:
This ensures stale endpoints are cleaned up without manual intervention.
Dynamo provides a KV store abstraction for storing metadata (endpoint instances, model deployment cards, event channels). Multiple backends are supported:
The Dynamo operator automatically sets DYN_DISCOVERY_BACKEND=kubernetes for pods. No additional setup required.
For bare-metal production deployments, deploy a 3-node etcd cluster for high availability.
Balance between failure detection speed and overhead:
The default (10s) is a reasonable starting point for most deployments.