The Dynamo Operator exposes Prometheus metrics for monitoring its own health and performance. These metrics are separate from application metrics (frontend/worker) and provide visibility into:
The operator metrics feature requires the same monitoring infrastructure as application metrics. For detailed setup instructions, see the Kubernetes Metrics Guide.
Quick checklist:
Operator metrics are automatically collected via a ServiceMonitor, which is created by the Helm chart when metricsService.enabled: true (default).
Unlike application metrics (which use PodMonitor), the operator uses ServiceMonitor and requires no manual RBAC configuration. The operator’s metrics endpoint uses controller-runtime’s built-in WithAuthenticationAndAuthorization filter for secure serving.
To verify the ServiceMonitor is created:
To disable operator metrics collection:
All metrics use the dynamo_operator namespace prefix.
Labels:
resource_type: DynamoGraphDeployment, DynamoComponentDeployment, DynamoModel, DynamoGraphDeploymentRequest, DynamoGraphDeploymentScalingAdapternamespace: Target namespace of the resourceresult: success, error, requeueerror_type: not_found, already_exists, conflict, validation, bad_request, unauthorized, forbidden, timeout, server_timeout, unavailable, rate_limited, internalLabels:
resource_type: Same as reconciliation metricsoperation: CREATE, UPDATE, DELETEresult: allowed, deniedreason: Validation failure reason (e.g., immutable_field_changed, invalid_config)Labels:
resource_type: DynamoGraphDeployment, DynamoComponentDeployment, DynamoModel, DynamoGraphDeploymentRequest, DynamoGraphDeploymentScalingAdapternamespace: Resource namespacestatus: Resource state derived from each CRD’s status. Common values:
"ready" - Resource is healthy and operational (DCD, DM, DGDSA)"not_ready" - Resource exists but is not operational (DCD, DM, DGDSA)"unknown" - State cannot be determined (default for empty status)"pending", "successful", "failed" from .status.state"Pending", "Profiling", "Ready", "Deploying", "Deployed", "Failed" from .status.phaseA pre-built Grafana dashboard is available for visualizing operator metrics.
Reconciliation Metrics (3 panels)
Webhook Metrics (3 panels)
Resource Inventory (2 panels)
Operational Health (2 panels)
The dashboard will automatically appear in Grafana (assuming you have the Grafana dashboard sidecar configured, which is included in kube-prometheus-stack).
Port-forward to Grafana (if needed):
Log in to Grafana at http://localhost:3000
Navigate to Dashboards → Search for “Dynamo Operator”
The dashboard includes two filter variables:
When “All” is selected for Resource Type, all panels will show data for all five managed CRDs with resource_type labels for differentiation.
For instructions on accessing Prometheus and Grafana, see the Kubernetes Metrics Guide.
Once you have access to Prometheus, you can query operator metrics directly:
Check ServiceMonitor exists:
Check ServiceMonitor is discovered by Prometheus:
serviceMonitor/dynamo-system/dynamo-platform-dynamo-operator-operatorUPCheck Prometheus selector configuration:
Ensure serviceMonitorSelectorNilUsesHelmValues: false was set during kube-prometheus-stack installation.
Check ConfigMap is created:
Check ConfigMap has the label:
Should return "1"
Check Grafana dashboard sidecar configuration:
The sidecar should be configured to watch for grafana_dashboard: "1" label.
Restart Grafana pod to force dashboard refresh: