GPU Telemetry with AIPerf
This guide shows you how to collect GPU metrics (power, utilization, memory, temperature, etc.) during AIPerf benchmarking. GPU telemetry provides insights into GPU performance and resource usage while running inference workloads.
Overview
This guide covers three setup paths depending on your inference backend and requirements:
Path 1: Dynamo (Built-in DCGM)
If you’re using Dynamo, it comes with DCGM pre-configured on port 9401. No additional setup needed! Just use the --gpu-telemetry flag to enable console display and optionally add additional DCGM url endpoints. URLs can be specified with or without the http:// prefix (e.g., localhost:9400 or http://localhost:9400).
Path 2: Other Inference Servers (Custom DCGM)
If you’re using any other inference backend, you’ll need to set up DCGM separately.
Path 3: Local GPU Monitoring (pynvml)
If you want simple local GPU monitoring without DCGM, use --gpu-telemetry pynvml. This uses NVIDIA’s nvidia-ml-py Python library (commonly known as pynvml) to collect metrics directly from the GPU driver. No HTTP endpoints or additional containers required.
Path 4: AMD ROCm GPUs (amdsmi)
If you’re benchmarking against an inference server running on AMD ROCm GPUs (Instinct MI300X, MI355X, etc.), use --gpu-telemetry amdsmi. This uses the amdsmi Python bindings shipped with ROCm to collect metrics directly from the AMD driver. No HTTP endpoints required. Install the bindings via pip install /opt/rocm/share/amd_smi/amdsmi-*.whl if not already present (they ship with ROCm).
Prerequisites
- NVIDIA GPU with CUDA support, or AMD GPU with ROCm 6.x/7.x
- Docker installed and configured
Understanding GPU Telemetry in AIPerf
AIPerf provides GPU telemetry collection with the --gpu-telemetry flag. Here’s how it works:
How the --gpu-telemetry Flag Works
DCGM mode (default): The default endpoints http://localhost:9400/metrics and http://localhost:9401/metrics are always attempted for telemetry collection, regardless of whether the --gpu-telemetry flag is used. The flag primarily controls whether metrics are displayed on the console and allows you to specify additional custom DCGM exporter endpoints.
pynvml mode: When using --gpu-telemetry pynvml, DCGM endpoints are NOT used. Metrics are collected directly from local GPUs via the nvidia-ml-py library.
amdsmi mode: When using --gpu-telemetry amdsmi, DCGM endpoints are NOT used. Metrics are collected directly from local AMD GPUs via the amdsmi library and emitted under vendor-namespaced amd_* field names (amd_power, amd_gfx_activity, amd_temperature, etc.) rather than NVML-shaped names. On Instinct datacenter parts amd_mm_activity is generally absent (sensor returns 'N/A'); amd_throttle_status is a 0.0/1.0 snapshot per scrape (amdsmi exposes a boolean state, not a duration counter).
To completely disable GPU telemetry collection, use --no-gpu-telemetry.
When specifying custom DCGM exporter URLs, the http:// prefix is optional. URLs like localhost:9400 will automatically be treated as http://localhost:9400. Both formats work identically.
For simple local GPU monitoring without DCGM setup, use --gpu-telemetry pynvml. This collects metrics directly from the NVIDIA driver using the nvidia-ml-py library. See Path 3: pynvml for details.
Real-Time Dashboard View
Adding dashboard to the --gpu-telemetry flag enables a live terminal UI (TUI) that displays GPU metrics in real-time during your benchmark runs:
1: Using Dynamo
Dynamo includes DCGM out of the box on port 9401 - no extra setup needed!
Setup Dynamo Server
Verify Dynamo is Running
Run AIPerf Benchmark
Sample Output (Successful Run):
The GPU telemetry table displays real-time metrics collected from DCGM during the benchmark. Each GPU is shown with its metrics aggregated across the benchmark duration.
The dashboard keyword enables a live terminal UI for real-time GPU telemetry visualization. Press 5 to maximize the GPU Telemetry panel during the benchmark run.
2: Using Other Inference Server
This path works with vLLM, SGLang, TRT-LLM, or any inference server. We’ll use vLLM as an example.
Setup vLLM Server with DCGM
The setup includes three steps: creating a custom metrics configuration, starting the DCGM Exporter, and launching the vLLM server.
You can customize the custom_gpu_metrics.csv file by commenting out metrics you don’t need. Lines starting with # are ignored.
Key Configuration:
-p 9401:9400- Forward container’s port 9400 to host’s port 9401 (AIPerf’s default)-e DCGM_EXPORTER_INTERVAL=33- Collect metrics every 33ms for fine-grained profiling-v custom_gpu_metrics.csv:...- Mount your custom metrics configuration
Replace the vLLM command above with your preferred backend (SGLang, TRT-LLM, etc.). The DCGM setup works with any server.
Verify Everything is Running
Run AIPerf Benchmark
The dashboard keyword enables a live terminal UI for real-time GPU telemetry visualization. Press 5 to maximize the GPU Telemetry panel during the benchmark run.
3: Using pynvml (Local GPU Monitoring)
For simple local GPU monitoring without DCGM infrastructure, AIPerf supports direct GPU metrics collection using NVIDIA’s nvidia-ml-py Python library (commonly known as pynvml). This approach requires no additional containers, HTTP endpoints, or DCGM setup.
Prerequisites
- NVIDIA GPU with driver installed
- nvidia-ml-py package:
pip install nvidia-ml-py
When to Use pynvml
Run AIPerf with pynvml
dashboard after pynvml for the real-time terminal UI: --gpu-telemetry pynvml dashboardMetrics Collected via pynvml
The nvidia-ml-py library (pynvml) collects the following metrics directly from the NVIDIA driver:
Not all metrics are available on all GPU models. AIPerf gracefully handles missing metrics and reports only what the hardware supports.
Comparing DCGM vs pynvml
4. Using amdsmi (Local AMD ROCm GPU Monitoring)
For inference workloads on AMD Instinct GPUs (MI300X, MI355X, etc.), use --gpu-telemetry amdsmi. This collects metrics directly from local AMD GPUs via the amdsmi Python library shipped with ROCm.
When to Use amdsmi
- Benchmarking against vLLM-ROCm, SGLang-ROCm, TGI, or any ROCm-backed inference server running on the same machine as AIPerf.
- Local single-node monitoring with no need for HTTP exporters.
Run AIPerf with amdsmi
Metrics Collected via amdsmi
AMD signals are emitted under their own vendor-namespaced field names (not aliased onto NVML-shaped names) because the underlying sensors do not always measure the same physical quantity (e.g. amdsmi gfx_activity and NVML sm_utilization sample at different scopes).
Comparing DCGM vs pynvml vs amdsmi
Multi-Node GPU Telemetry Example
For distributed setups with multiple nodes, you can collect GPU telemetry from all nodes simultaneously:
This will collect GPU metrics from:
http://localhost:9400/metrics(default, always attempted)http://localhost:9401/metrics(default, always attempted)http://node1:9400(custom node 1, normalized fromnode1:9400)http://node2:9400(custom node 2, normalized fromnode2:9400)http://node3:9400/metrics(custom node 3)
All metrics are displayed on the console and saved to the output CSV and JSON files, with GPU indices and hostnames distinguishing metrics from different nodes.
Customizing Displayed Metrics
You can customize which GPU metrics are displayed in AIPerf by creating a custom metrics CSV file and passing it to --gpu-telemetry:
Custom Metrics CSV Format
The CSV format is identical to DCGM exporter configuration. See the vLLM setup section above (Step 1: Create a custom metrics configuration) for the complete CSV format example with all available DCGM fields.
Behavior: Custom metrics extend (not replace) the 7 core default metrics:
- GPU Power Usage
- Energy Consumption
- GPU Utilization
- GPU Memory Used
- GPU Temperature
- XID Errors
- Power Violation
The file path can be absolute or relative. Use .csv extension so AIPerf can distinguish it from DCGM endpoint URLs.
You can start with the example CSV from the vLLM setup section and customize it by commenting out metrics you don’t need or adding new DCGM metrics.