For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
  • Kubernetes Deployment
    • Deployment Guide
  • User Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Dynamo Benchmarking
    • Multimodal
    • Diffusion (Preview)
    • Tool Calling
    • LoRA Adapters
    • Agents
    • Observability (Local)
      • Prometheus + Grafana Setup
      • Metrics
      • Metrics Developer Guide
      • Health Checks
      • Tracing
      • Logging
    • Fault Tolerance
    • Writing Python Workers
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
    • Blog
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Metrics Exposure
  • Metric Name Constants
  • Metrics API in Rust
  • Available Methods
  • Creating Metrics
  • Using Metrics
  • Vector Metrics with Labels
  • Advanced Features
  • Related Documentation
User GuidesObservability (Local)

Metrics Developer Guide

||View as Markdown|
Edit this page
Previous

Metrics

Next

Health Checks

This guide explains how to create and use custom metrics in Dynamo components using the Dynamo metrics API.

Metrics Exposure

All metrics created via the Dynamo metrics API are automatically exposed on the /metrics HTTP endpoint in Prometheus Exposition Format text when the following environment variable is set:

  • DYN_SYSTEM_PORT=<port> - Port for the metrics endpoint (set to positive value to enable, default: -1 disabled)

Example:

$DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model <model>

Prometheus Exposition Format text metrics will be available at: http://localhost:8081/metrics

Metric Name Constants

The prometheus_names.rs module provides centralized metric name constants and sanitization functions to ensure consistency across all Dynamo components.


Metrics API in Rust

The metrics API is accessible through the .metrics() method on runtime, namespace, component, and endpoint objects. See Runtime Hierarchy for details on the hierarchical structure.

Available Methods

  • .metrics().create_counter(): Create a counter metric
  • .metrics().create_gauge(): Create a gauge metric
  • .metrics().create_histogram(): Create a histogram metric
  • .metrics().create_countervec(): Create a counter with labels
  • .metrics().create_gaugevec(): Create a gauge with labels
  • .metrics().create_histogramvec(): Create a histogram with labels

Creating Metrics

1use dynamo_runtime::DistributedRuntime;
2
3let runtime = DistributedRuntime::new()?;
4let endpoint = runtime.namespace("my_namespace").component("my_component").endpoint("my_endpoint");
5
6// Simple metrics
7let requests_total = endpoint.metrics().create_counter(
8 "requests_total",
9 "Total requests",
10 &[]
11)?;
12
13let active_connections = endpoint.metrics().create_gauge(
14 "active_connections",
15 "Active connections",
16 &[]
17)?;
18
19let latency = endpoint.metrics().create_histogram(
20 "latency_seconds",
21 "Request latency",
22 &[],
23 Some(vec![0.001, 0.01, 0.1, 1.0, 10.0])
24)?;

Using Metrics

1// Counters
2requests_total.inc();
3
4// Gauges
5active_connections.set(42.0);
6active_connections.inc();
7active_connections.dec();
8
9// Histograms
10latency.observe(0.023); // 23ms

Vector Metrics with Labels

1// Create vector metrics with label names
2let requests_by_model = endpoint.metrics().create_countervec(
3 "requests_by_model",
4 "Requests by model",
5 &["model_type", "model_size"],
6 &[]
7)?;
8
9let memory_by_gpu = endpoint.metrics().create_gaugevec(
10 "gpu_memory_bytes",
11 "GPU memory by device",
12 &["gpu_id", "memory_type"],
13 &[]
14)?;
15
16// Use with specific label values
17requests_by_model.with_label_values(&["llama", "7b"]).inc();
18memory_by_gpu.with_label_values(&["0", "allocated"]).set(8192.0);

Advanced Features

Custom histogram buckets:

1let latency = endpoint.metrics().create_histogram(
2 "latency_seconds",
3 "Request latency",
4 &[],
5 Some(vec![0.001, 0.01, 0.1, 1.0, 10.0])
6)?;

Constant labels:

1let counter = endpoint.metrics().create_counter(
2 "requests_total",
3 "Total requests",
4 &[("region", "us-west"), ("env", "prod")]
5)?;

Related Documentation

  • Metrics Overview
  • Prometheus and Grafana Setup
  • Distributed Runtime Architecture