Dynamo provides health check and liveness HTTP endpoints for each component which can be used to configure startup, liveness and readiness probes in orchestration frameworks such as Kubernetes.
Enable health checks and query endpoints:
Check health status:
The frontend liveness endpoint reports a status of live as long as
the service is running.
Frontend liveness doesn’t depend on worker health or liveness only on the Frontend service itself.
The frontend health endpoint reports a status of healthy as long as
the service is running. Once workers have been registered, the
health endpoint will also list registered endpoints and instances.
Frontend liveness doesn’t depend on worker health or liveness only on the Frontend service itself.
Before workers are registered:
After workers are registered:
Health checks for components other than the frontend are enabled
selectively based on environment variables. If a health check for a
component is enabled the starting status can be set along with the set
of endpoints that are required to be served before the component is
declared ready.
Once all endpoints declared in DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS
are served the component transitions to a ready state until the
component is shutdown. The endpoints return HTTP status code of HTTP/1.1 503 Service Unavailable
when initializing and HTTP status code HTTP/1.1 200 OK once ready.
Both /live and /ready return the same information
Before endpoints are being served:
After endpoints are being served:
In addition to the HTTP endpoints described above, Dynamo includes a canary health check system that actively monitors worker endpoints.
The canary health check system:
Health checks are automatically enabled by the Dynamo operator. No additional configuration is required.
To enable health checks locally:
Each backend defines its own minimal health check payload:
These payloads are designed to:
When health checks are enabled, you’ll see logs like:
If an endpoint fails:
Enable in production (Kubernetes):
Disable in development:
Health checks timing out:
DYN_HEALTH_CHECK_REQUEST_TIMEOUTToo many health check logs:
DYN_CANARY_WAIT_TIME to reduce frequencyDYN_HEALTH_CHECK_ENABLED=false in devHealth checks not running:
DYN_HEALTH_CHECK_ENABLED=true is setDYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS includes the endpoint