When running vLLM through Dynamo, vLLM engine metrics are automatically passed through and exposed on Dynamo’s /metrics endpoint (default port 8081). This allows you to access both vLLM engine metrics (prefixed with vllm:) and Dynamo runtime metrics (prefixed with dynamo_*) from a single worker backend endpoint.
For the complete and authoritative list of all vLLM metrics, always refer to the official vLLM Metrics Design documentation.
For LMCache metrics and integration, see the LMCache Integration Guide.
For Dynamo runtime metrics, see the Dynamo Metrics Guide.
For visualization setup instructions, see the Prometheus and Grafana Setup Guide.
This is a single machine example.
For visualizing metrics with Prometheus and Grafana, start the observability stack. See Observability Getting Started for instructions.
Launch a frontend and vLLM backend to test metrics:
Wait for the vLLM worker to start, then send requests and check metrics:
vLLM exposes metrics in Prometheus Exposition Format text at the /metrics HTTP endpoint. All vLLM engine metrics use the vllm: prefix and include labels (e.g., model_name, finished_reason, scheduling_event) to identify the source.
Example Prometheus Exposition Format text:
The specific metrics shown above are examples and may vary depending on your vLLM version. Always inspect your actual /metrics endpoint or refer to the official documentation for the current list.
vLLM provides metrics in the following categories (all prefixed with vllm:):
Specific metrics are subject to change between vLLM versions. Always refer to the official documentation or inspect the /metrics endpoint for your vLLM version.
The official vLLM documentation includes complete metric definitions with:
model_name, finished_reason, scheduling_event)For the complete and authoritative list of all vLLM metrics, see the official vLLM Metrics Design documentation.
When LMCache is enabled with --connector lmcache and DYN_SYSTEM_PORT is set, LMCache metrics (prefixed with lmcache:) are automatically exposed via Dynamo’s /metrics endpoint alongside vLLM and Dynamo metrics.
To access LMCache metrics, both of these are required:
--connector lmcache - Enables LMCache in vLLMDYN_SYSTEM_PORT=8081 - Enables Dynamo’s metrics HTTP endpointExample:
Troubleshooting LMCache-related metrics and logs (including PrometheusLogger instance already created with different metadata and PROMETHEUS_MULTIPROC_DIR warnings) is documented in:
For complete LMCache configuration and metric details, see:
prometheus_client.multiprocessPROMETHEUS_MULTIPROC_DIR: (optional). By default, Dynamo automatically manages this environment variable, setting it to a temporary directory where multiprocess metrics are stored as memory-mapped files. Each worker process writes its metrics to separate files in this directory, which are aggregated when /metrics is scraped. Users only need to set this explicitly where complete control over the metrics directory is required.MultiProcessCollector to aggregate metrics from all worker processesvllm: and lmcache: prefixes before being exposed (when LMCache is enabled)register_engine_metrics_callback() function with the global REGISTRYdynamo_*) are available at the same /metrics endpoint alongside vLLM metrics
lib/runtime/src/metrics.rs (Rust runtime metrics)lib/runtime/src/metrics/prometheus_names.rs (metric name constants)components/src/dynamo/common/utils/prometheus.py - Prometheus utilities and callback registration