Dynamo MetricsRegistry#
Overview#
Dynamo provides built-in metrics capabilities through the MetricsRegistry
trait, which is automatically available whenever you use the DistributedRuntime
framework. This guide explains how to use metrics for observability and monitoring across all Dynamo components.
Automatic Metrics#
Dynamo automatically exposes metrics with the dynamo_
name prefixes. It also adds the following labels dynamo_namespace
, dynamo_component
, and dynamo_endpoint
to indicate which component is providing the metric.
Frontend Metrics: When using Dynamo HTTP Frontend (--framework VLLM
or --framework TRTLLM
), these metrics are automatically exposed with the dynamo_frontend_*
prefix and include model
labels containing the model name. These cover request handling, token processing, and latency measurements. See the Available Metrics section for the complete list of frontend metrics.
Component Metrics: The core Dynamo backend system automatically exposes metrics with the dynamo_component_*
prefix for all components that use the DistributedRuntime
framework. These include request counts, processing times, byte transfers, and system uptime metrics. See the Available Metrics section for the complete list of component metrics.
Specialized Component Metrics: Components can also expose additional metrics specific to their functionality. For example, a preprocessor
component exposes metrics with the dynamo_preprocessor_*
prefix. See the Available Metrics section for details on specialized component metrics.
Kubernetes Integration: For comprehensive Kubernetes deployment and monitoring setup, see the Kubernetes Metrics Guide. This includes Prometheus Operator setup, metrics collection configuration, and visualization in Grafana.
Metrics Hierarchy#
The MetricsRegistry
trait is implemented by DistributedRuntime
, Namespace
, Component
, and Endpoint
, providing a hierarchical approach to metric collection that matches Dynamo’s distributed architecture:
DistributedRuntime
: Global metrics across the entire runtimeNamespace
: Metrics scoped to a specific dynamo_namespaceComponent
: Metrics for a specific dynamo_component within a namespaceEndpoint
: Metrics for individual dynamo_endpoint within a component
This hierarchical structure allows you to create metrics at the appropriate level of granularity for your monitoring needs.
Getting Started#
For a complete setup guide including Docker Compose configuration, Prometheus setup, and Grafana dashboards, see the Getting Started section in the deploy metrics documentation.
The quick start includes:
Docker Compose setup for Prometheus and Grafana
Pre-configured dashboards and datasources
Access URLs for all monitoring endpoints
GPU targeting configuration
Implementation Examples#
See Implementation Examples for detailed examples of creating metrics at different hierarchy levels and using dynamic labels.
Grafana Dashboards#
Use dashboards in deploy/metrics/grafana_dashboards/
:
grafana-dynamo-dashboard.json
: General Dynamo dashboardgrafana-dcgm-metrics.json
: DCGM GPU metrics dashboard
Metrics Visualization Architecture#
Service Topology#
The metrics system follows this architecture for collecting and visualizing metrics:
graph TD BROWSER[Browser] -->|:3001| GRAFANA[Grafana :3001] subgraph DockerComposeNetwork [Network inside Docker Compose] NATS_PROM_EXP[nats-prom-exp :7777 /metrics] -->|:8222/varz| NATS_SERVER[nats-server :4222, :6222, :8222] PROMETHEUS[Prometheus server :9090] -->|:2379/metrics| ETCD_SERVER[etcd-server :2379, :2380] PROMETHEUS -->|:9401/metrics| DCGM_EXPORTER[dcgm-exporter :9401] PROMETHEUS -->|:7777/metrics| NATS_PROM_EXP PROMETHEUS -->|:8080/metrics| DYNAMOFE[Dynamo HTTP FE :8080] PROMETHEUS -->|:8081/metrics| DYNAMOBACKEND[Dynamo backend :8081] DYNAMOFE --> DYNAMOBACKEND GRAFANA -->|:9090/query API| PROMETHEUS end
Grafana Dashboard#
The metrics system includes a pre-configured Grafana dashboard for visualizing service metrics: