When running vLLM through Dynamo, vLLM engine metrics are automatically passed through and exposed on Dynamo’s /metrics endpoint (default port 8081). This allows you to access both vLLM engine metrics (prefixed with vllm:) and Dynamo runtime metrics (prefixed with dynamo_*) from a single worker backend endpoint.
For the complete and authoritative list of all vLLM metrics, always refer to the official vLLM Metrics Design documentation.
For LMCache metrics and integration, see the LMCache Integration Guide.
For Dynamo runtime metrics, see the Dynamo Metrics Guide.
For visualization setup instructions, see the Prometheus and Grafana Setup Guide.
This is a single machine example.
For visualizing metrics with Prometheus and Grafana, start the observability stack. See Observability Getting Started for instructions.
The launch scripts in examples/backends/vllm/launch/ already enable metrics on port 8081 by default. For example:
Once the deployment is running, send a request and check metrics:
vLLM exposes metrics in Prometheus Exposition Format text at the /metrics HTTP endpoint. All vLLM engine metrics use the vllm: prefix and include labels (e.g., model_name, finished_reason, scheduling_event) to identify the source.
Example Prometheus Exposition Format text:
Note: The specific metrics shown above are examples and may vary depending on your vLLM version. Always inspect your actual /metrics endpoint or refer to the official documentation for the current list.
vLLM provides metrics in the following categories (all prefixed with vllm:):
Note: Specific metrics are subject to change between vLLM versions. Always refer to the official documentation or inspect the /metrics endpoint for your vLLM version.
The official vLLM documentation includes complete metric definitions with:
model_name, finished_reason, scheduling_event)For the complete and authoritative list of all vLLM metrics, see the official vLLM Metrics Design documentation.
When LMCache is enabled, LMCache metrics (prefixed with lmcache:) are automatically exposed via Dynamo’s /metrics endpoint alongside vLLM and Dynamo metrics.
To try it out, use the LMCache launch script:
Send a request and view LMCache metrics:
Troubleshooting LMCache-related metrics and logs (including PrometheusLogger instance already created with different metadata and PROMETHEUS_MULTIPROC_DIR warnings) is documented in:
For complete LMCache configuration and metric details, see:
prometheus_client.multiprocessPROMETHEUS_MULTIPROC_DIR: (optional). By default, Dynamo automatically manages this environment variable, setting it to a temporary directory where multiprocess metrics are stored as memory-mapped files. Each worker process writes its metrics to separate files in this directory, which are aggregated when /metrics is scraped. Users only need to set this explicitly where complete control over the metrics directory is required.MultiProcessCollector to aggregate metrics from all worker processesvllm: and lmcache: prefixes before being exposed (when LMCache is enabled)register_engine_metrics_callback() function with the global REGISTRYdynamo_*) are available at the same /metrics endpoint alongside vLLM metrics
lib/runtime/src/metrics.rs (Rust runtime metrics)lib/runtime/src/metrics/prometheus_names.rs (metric name constants)components/src/dynamo/common/utils/prometheus.py - Prometheus utilities and callback registration