Observability
Config Manager exposes structured logs, Prometheus metrics, optional PodMonitor resources, and a local development observability stack. Production deployments can integrate these signals with an existing Prometheus, Loki, or Grafana environment.
Structured Logs
Python services emit JSON logs by default. Each service calls configure_logging(service="<name>") at startup, and modules get a categorized logger with get_logger(__name__, category=LogCategory.<CATEGORY>).
The main environment variables are:
JSON log records include the message, level, logger name, timestamp, module, line number, service name, and category. Categories use dotted names so downstream tooling can filter broadly or narrowly.
Common categories include:
For the detailed warning and error message catalog, see the Log Message Reference.
Custom Labels
Set global.customLabels in Helm values to attach deployment-specific labels to every Config Manager log line and every scraped Prometheus metric:
The chart serializes the map into NV_CONFIG_MANAGER_CUSTOM_LABELS for service containers and also applies those values as pod labels. PodMonitor resources promote the pod labels onto scraped samples.
Custom label keys must work as Kubernetes labels, Prometheus labels, and Python log-record attributes. Use letters, numbers, and underscores, and avoid reserved fields such as service, category, message, levelname, name, module, and lineno. Values are truncated to 63 characters to stay within Kubernetes label limits.
Metrics
Services expose Prometheus metrics on their operational /metrics endpoints. Enable chart monitoring resources with monitoring.enabled and monitoring.podMonitors.enabled when the target cluster has Prometheus Operator CRDs installed.
The chart can also render HTTP probes for gateway-facing endpoints when monitoring.probes.enabled is set and a Blackbox Exporter is available. Set monitoring.grafanaUrl to display a Grafana link in the Config Manager UI.
Key service metrics include:
Local Observability Stack
For Kind and airgapped demo environments, the installer can deploy a local observability stack by setting infrastructure.monitoring.observability_enabled: true or by toggling Enable local observability stack on the TUI Infrastructure screen.
This path is for local development and demos only. It installs:
The stack uses ephemeral storage and cluster-scoped CRDs. Do not enable it in a shared cluster that already has a production monitoring stack or another owner for the Prometheus Operator CRDs.
To inspect the local stack:
Use Prometheus at http://localhost:9090 to query {namespace="nv-config-manager"}. Use Alloy at http://localhost:12345 to inspect discovered targets.
Grafana Dashboard
A reference dashboard is included in the Helm chart and available here for download:
The dashboard contains panels for error logs, service logs, render event throughput, DHCP config age, and HTTP request rate.
Import the dashboard into Grafana and select the Prometheus and Loki data sources for the environment. The Loki stream selector depends on your log shipper; common selectors use labels such as namespace, k8s_namespace_name, or cluster.
For the in-chart local Grafana path, the chart also renders a dashboard ConfigMap when the local observability overlay enables Grafana.