For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Installation
    • Support Matrix
    • Feature Matrix
    • Examples
  • Kubernetes Deployment
  • User Guides
    • Tool Calling
    • Multimodality Support
    • Finding Best Initial Configs
    • Dynamo Benchmarking Guide
    • Tuning Disaggregated Performance
    • Writing Python Workers in Dynamo
      • Overview
      • Prometheus + Grafana Setup
      • Metrics
      • Metrics Developer Guide
      • Health Checks
      • Tracing
      • Logging
    • Glossary
  • Components
    • Router
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Getting Started Quickly
  • Prerequisites
  • Starting the Observability Stack
  • Observability Documentations
  • Developer Guides
  • Kubernetes
  • Topology
  • Service Relationship Diagram
  • Configuration Files
User GuidesObservability (Local)

Dynamo Observability

||View as Markdown|
Edit this page
Previous

Dynamo Runtime

Next

Metrics Visualization with Prometheus and Grafana

Getting Started Quickly

This is an example to get started quickly on a single machine.

Prerequisites

Install these on your machine:

  • Docker
  • Docker Compose

Starting the Observability Stack

Dynamo provides a Docker Compose-based observability stack that includes Prometheus, Grafana, Tempo, and various exporters for metrics, tracing, and visualization.

From the Dynamo root directory:

$# Start infrastructure (NATS, etcd)
$docker compose -f deploy/docker-compose.yml up -d
$
$# Start observability stack (Prometheus, Grafana, Tempo, DCGM GPU exporter, NATS exporter)
$docker compose -f deploy/docker-observability.yml up -d

For detailed setup instructions and configuration, see Prometheus + Grafana Setup.

Observability Documentations

GuideDescriptionEnvironment Variables to Control
MetricsAvailable metrics referenceDYN_SYSTEM_PORT†
Health ChecksComponent health monitoring and readiness probesDYN_SYSTEM_PORT†, DYN_SYSTEM_STARTING_HEALTH_STATUS, DYN_SYSTEM_HEALTH_PATH, DYN_SYSTEM_LIVE_PATH, DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS
TracingDistributed tracing with OpenTelemetry and TempoDYN_LOGGING_JSONL†, OTEL_EXPORT_ENABLED†, OTEL_EXPORTER_OTLP_TRACES_ENDPOINT†, OTEL_SERVICE_NAME†
LoggingStructured logging configurationDYN_LOGGING_JSONL†, DYN_LOG, DYN_LOG_USE_LOCAL_TZ, DYN_LOGGING_CONFIG_PATH, OTEL_SERVICE_NAME†, OTEL_EXPORT_ENABLED†, OTEL_EXPORTER_OTLP_TRACES_ENDPOINT†

Variables marked with † are shared across multiple observability systems.

Developer Guides

GuideDescriptionEnvironment Variables to Control
Metrics Developer GuideCreating custom metrics in Rust and PythonDYN_SYSTEM_PORT†

Kubernetes

For Kubernetes-specific setup and configuration, see Kubernetes Observability.


Topology

This provides:

  • Prometheus on http://localhost:9090 - metrics collection and querying
  • Grafana on http://localhost:3000 - visualization dashboards (username: dynamo, password: dynamo)
  • Tempo on http://localhost:3200 - distributed tracing backend
  • DCGM Exporter on http://localhost:9401/metrics - GPU metrics
  • NATS Exporter on http://localhost:7777/metrics - NATS messaging metrics

Service Relationship Diagram

The dcgm-exporter service in the Docker Compose network is configured to use port 9401 instead of the default port 9400. This adjustment is made to avoid port conflicts with other dcgm-exporter instances that may be running simultaneously. Such a configuration is typical in distributed systems like SLURM.

Configuration Files

The following configuration files are located in the deploy/observability/ directory:

  • docker-compose.yml: Defines NATS and etcd services
  • docker-observability.yml: Defines Prometheus, Grafana, Tempo, and exporters
  • prometheus.yml: Contains Prometheus scraping configuration
  • grafana-datasources.yml: Contains Grafana datasource configuration
  • grafana_dashboards/dashboard-providers.yml: Contains Grafana dashboard provider configuration
  • grafana_dashboards/dynamo.json: A general Dynamo Dashboard for both SW and HW metrics
  • grafana_dashboards/dcgm-metrics.json: Contains Grafana dashboard configuration for DCGM GPU metrics
  • grafana_dashboards/kvbm.json: Contains Grafana dashboard configuration for KVBM metrics