Observability (Local)
Observability (Local)
Observability (Local)
This is an example to get started quickly on a single machine.
Install these on your machine:
Dynamo provides a Docker Compose-based observability stack that includes Prometheus, Grafana, Tempo, Loki, an OpenTelemetry Collector, and various exporters for metrics, tracing, logging, and visualization.
From the Dynamo root directory:
For detailed setup instructions and configuration, see Prometheus + Grafana Setup.
Variables marked with † are shared across multiple observability systems.
For Kubernetes-specific setup and configuration, see docs/kubernetes/observability/.
Operator Metrics: The Dynamo Operator running in Kubernetes exposes its own set of metrics for monitoring controller reconciliation, webhook validation, and resource inventory. See the Operator Metrics Guide.
This provides:
http://localhost:9090 - metrics collection and queryinghttp://localhost:3000 - visualization dashboards (username: dynamo, password: dynamo)http://localhost:3200 - distributed tracing backendhttp://localhost:3100 - log aggregation backendhttp://localhost:4317 (gRPC) / http://localhost:4318 (HTTP) - receives OTLP signals and routes traces to Tempo and logs to Lokihttp://localhost:9401/metrics - GPU metricshttp://localhost:7777/metrics - NATS messaging metricsThe dcgm-exporter service in the Docker Compose network is configured to use port 9401 instead of the default port 9400. This adjustment is made to avoid port conflicts with other dcgm-exporter instances that may be running simultaneously. Such a configuration is typical in distributed systems like SLURM.
The following configuration files are located in the deploy/observability/ directory: