Metrics Developer Guide#
This guide explains how to create and use custom metrics in Dynamo components using the Dynamo metrics API.
Metrics Exposure#
All metrics created via the Dynamo metrics API are automatically exposed on the /metrics HTTP endpoint in Prometheus Exposition Format text when the following environment variable is set:
DYN_SYSTEM_PORT=<port>- Port for the metrics endpoint (set to positive value to enable, default:-1disabled)
Example:
DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model <model>
Prometheus Exposition Format text metrics will be available at: http://localhost:8081/metrics
Metric Name Constants#
The prometheus_names.rs module provides centralized metric name constants and sanitization functions to ensure consistency across all Dynamo components.
Metrics API in Rust#
The metrics API is accessible through the .metrics() method on runtime, namespace, component, and endpoint objects. See Runtime Hierarchy for details on the hierarchical structure.
Available Methods#
.metrics().create_counter(): Create a counter metric.metrics().create_gauge(): Create a gauge metric.metrics().create_histogram(): Create a histogram metric.metrics().create_countervec(): Create a counter with labels.metrics().create_gaugevec(): Create a gauge with labels.metrics().create_histogramvec(): Create a histogram with labels
Creating Metrics#
use dynamo_runtime::DistributedRuntime;
let runtime = DistributedRuntime::new()?;
let endpoint = runtime.namespace("my_namespace").component("my_component").endpoint("my_endpoint");
// Simple metrics
let requests_total = endpoint.metrics().create_counter(
"requests_total",
"Total requests",
&[]
)?;
let active_connections = endpoint.metrics().create_gauge(
"active_connections",
"Active connections",
&[]
)?;
let latency = endpoint.metrics().create_histogram(
"latency_seconds",
"Request latency",
&[],
Some(vec![0.001, 0.01, 0.1, 1.0, 10.0])
)?;
Using Metrics#
// Counters
requests_total.inc();
// Gauges
active_connections.set(42.0);
active_connections.inc();
active_connections.dec();
// Histograms
latency.observe(0.023); // 23ms
Vector Metrics with Labels#
// Create vector metrics with label names
let requests_by_model = endpoint.metrics().create_countervec(
"requests_by_model",
"Requests by model",
&["model_type", "model_size"],
&[]
)?;
let memory_by_gpu = endpoint.metrics().create_gaugevec(
"gpu_memory_bytes",
"GPU memory by device",
&["gpu_id", "memory_type"],
&[]
)?;
// Use with specific label values
requests_by_model.with_label_values(&["llama", "7b"]).inc();
memory_by_gpu.with_label_values(&["0", "allocated"]).set(8192.0);
Advanced Features#
Custom histogram buckets:
let latency = endpoint.metrics().create_histogram(
"latency_seconds",
"Request latency",
&[],
Some(vec![0.001, 0.01, 0.1, 1.0, 10.0])
)?;
Constant labels:
let counter = endpoint.metrics().create_counter(
"requests_total",
"Total requests",
&[("region", "us-west"), ("env", "prod")]
)?;
Metrics API in Python#
Python components can create and manage Prometheus metrics using the same metrics API through Python bindings.
Available Methods#
endpoint.metrics.create_counter()/create_intcounter(): Create a counter metricendpoint.metrics.create_gauge()/create_intgauge(): Create a gauge metricendpoint.metrics.create_histogram(): Create a histogram metricendpoint.metrics.create_countervec()/create_intcountervec(): Create a counter with labelsendpoint.metrics.create_gaugevec()/create_intgaugevec(): Create a gauge with labelsendpoint.metrics.create_histogramvec(): Create a histogram with labels
All metrics are imported from dynamo.prometheus_metrics.
Creating Metrics#
from dynamo.runtime import DistributedRuntime
drt = DistributedRuntime()
endpoint = drt.namespace("my_namespace").component("my_component").endpoint("my_endpoint")
# Simple metrics
requests_total = endpoint.metrics.create_intcounter(
"requests_total",
"Total requests"
)
active_connections = endpoint.metrics.create_intgauge(
"active_connections",
"Active connections"
)
latency = endpoint.metrics.create_histogram(
"latency_seconds",
"Request latency",
buckets=[0.001, 0.01, 0.1, 1.0, 10.0]
)
Using Metrics#
# Counters
requests_total.inc()
requests_total.inc_by(5)
# Gauges
active_connections.set(42)
active_connections.inc()
active_connections.dec()
# Histograms
latency.observe(0.023) # 23ms
Vector Metrics with Labels#
# Create vector metrics with label names
requests_by_model = endpoint.metrics.create_intcountervec(
"requests_by_model",
"Requests by model",
["model_type", "model_size"]
)
memory_by_gpu = endpoint.metrics.create_intgaugevec(
"gpu_memory_bytes",
"GPU memory by device",
["gpu_id", "memory_type"]
)
# Use with specific label values
requests_by_model.inc({"model_type": "llama", "model_size": "7b"})
memory_by_gpu.set(8192, {"gpu_id": "0", "memory_type": "allocated"})
Advanced Features#
Constant labels:
counter = endpoint.metrics.create_intcounter(
"requests_total",
"Total requests",
[("region", "us-west"), ("env", "prod")]
)
Metric introspection:
print(counter.name()) # "my_namespace_my_component_my_endpoint_requests_total"
print(counter.const_labels()) # {"dynamo_namespace": "my_namespace", ...}
print(gauge_vec.variable_labels()) # ["model_type", "model_size"]
Update patterns:
Background thread updates:
import threading
import time
def update_loop():
while True:
active_connections.set(compute_current_connections())
time.sleep(2)
threading.Thread(target=update_loop, daemon=True).start()
Callback-based updates (called before each /metrics scrape):
def update_metrics():
active_connections.set(compute_current_connections())
endpoint.metrics.register_callback(update_metrics)
Examples#
Example scripts: lib/bindings/python/examples/metrics/
cd ~/dynamo/lib/bindings/python/examples/metrics
DYN_SYSTEM_PORT=8081 ./server_with_loop.py
DYN_SYSTEM_PORT=8081 ./server_with_callback.py