vLLM Prometheus Metrics#
📚 Official Documentation: vLLM Metrics Design
This document describes how vLLM Prometheus metrics are exposed in Dynamo.
Overview#
When running vLLM through Dynamo, vLLM engine metrics are automatically passed through and exposed on Dynamo’s /metrics endpoint (default port 8081). This allows you to access both vLLM engine metrics (prefixed with vllm:) and Dynamo runtime metrics (prefixed with dynamo_*) from a single worker backend endpoint.
For the complete and authoritative list of all vLLM metrics, always refer to the official documentation linked above.
Dynamo runtime metrics are documented in docs/observability/metrics.md.
Metric Reference#
The official documentation includes:
Complete metric definitions with detailed explanations
Counter, Gauge, and Histogram metrics
Metric labels (e.g.,
model_name,finished_reason,scheduling_event)Design rationale and implementation details
Information about v1 metrics migration
Future work and deprecated metrics
Metric Categories#
vLLM provides metrics in the following categories (all prefixed with vllm:):
Request metrics
Performance metrics
Resource usage
Scheduler metrics
Disaggregation metrics (when enabled)
Note: Specific metrics are subject to change between vLLM versions. Always refer to the official documentation or inspect the /metrics endpoint for your vLLM version.
Enabling Metrics in Dynamo#
vLLM metrics are automatically exposed when running vLLM through Dynamo with metrics enabled.
Inspecting Metrics#
To see the actual metrics available in your vLLM version:
1. Launch vLLM with Metrics Enabled#
# Set system metrics port (automatically enables metrics server)
export DYN_SYSTEM_PORT=8081
# Start vLLM worker (metrics enabled by default via --disable-log-stats=false)
python -m dynamo.vllm --model <model_name>
# Wait for engine to initialize
Metrics will be available at: http://localhost:8081/metrics
2. Fetch Metrics via curl#
curl http://localhost:8081/metrics | grep "^vllm:"
3. Example Output#
Note: The specific metrics shown below are examples and may vary depending on your vLLM version. Always inspect your actual /metrics endpoint for the current list.
# HELP vllm:request_success_total Number of successfully finished requests.
# TYPE vllm:request_success_total counter
vllm:request_success_total{finished_reason="length",model_name="meta-llama/Llama-3.1-8B"} 15.0
vllm:request_success_total{finished_reason="stop",model_name="meta-llama/Llama-3.1-8B"} 150.0
# HELP vllm:time_to_first_token_seconds Histogram of time to first token in seconds.
# TYPE vllm:time_to_first_token_seconds histogram
vllm:time_to_first_token_seconds_bucket{le="0.001",model_name="meta-llama/Llama-3.1-8B"} 0.0
vllm:time_to_first_token_seconds_bucket{le="0.005",model_name="meta-llama/Llama-3.1-8B"} 5.0
vllm:time_to_first_token_seconds_count{model_name="meta-llama/Llama-3.1-8B"} 165.0
vllm:time_to_first_token_seconds_sum{model_name="meta-llama/Llama-3.1-8B"} 89.38
Implementation Details#
vLLM v1 uses multiprocess metrics collection via
prometheus_client.multiprocessPROMETHEUS_MULTIPROC_DIR: (optional). By default, Dynamo automatically manages this environment variable, setting it to a temporary directory where multiprocess metrics are stored as memory-mapped files. Each worker process writes its metrics to separate files in this directory, which are aggregated when/metricsis scraped. Users only need to set this explicitly where complete control over the metrics directory is required.Dynamo uses
MultiProcessCollectorto aggregate metrics from all worker processesMetrics are filtered by the
vllm:andlmcache:prefixes before being exposed (when LMCache is enabled)The integration uses Dynamo’s
register_engine_metrics_callback()function with the globalREGISTRYMetrics appear after vLLM engine initialization completes
vLLM v1 metrics are different from v0 - see the official documentation for migration details
See Also#
vLLM Metrics#
Dynamo Metrics#
Dynamo Metrics Guide: See docs/observability/metrics.md for complete documentation on Dynamo runtime metrics
Dynamo Runtime Metrics: Metrics prefixed with
dynamo_*for runtime, components, endpoints, and namespacesImplementation:
lib/runtime/src/metrics.rs(Rust runtime metrics)Metric names:
lib/runtime/src/metrics/prometheus_names.rs(metric name constants)Available at the same
/metricsendpoint alongside vLLM metrics
Integration Code:
components/src/dynamo/common/utils/prometheus.py- Prometheus utilities and callback registration