Metrics¶
The TensorRT Inference server provides Prometheus metrics indicating GPU and request statistics. By default, these metrics are available at http://localhost:8002/metrics. The TRTIS --metrics-port option can be used to select a different port. The following table describes the available metrics.
| Category | Metric | Description | Granularity | Frequency | 
|---|---|---|---|---|
GPU 
Utilization 
 | 
Power Usage | GPU instantaneous power | Per GPU | Per second | 
| Power Limit | Maximum GPU power limit | Per GPU | Per second | |
Energy 
Consumption 
 | 
GPU energy consumption in joules 
since the server started 
 | 
Per GPU | Per second | |
| GPU Utilization | GPU utilization rate 
(0.0 - 1.0) 
 | 
Per GPU | Per second | |
GPU 
Memory 
 | 
GPU Total 
Memory 
 | 
Total GPU memory, in bytes 
 | 
Per GPU | Per second | 
GPU Used 
Memory 
 | 
Used GPU memory, in bytes 
 | 
Per GPU | Per second | |
| Count | Request Count | Number of inference requests 
 | 
Per model | Per request | 
| Execution Count | Number of inference executions 
(request count / execution count 
= average dynamic batch size) 
 | 
Per model | Per request | |
| Inference Count | Number of inferences performed 
(one request counts as 
“batch size” inferences) 
 | 
Per model | Per request | |
| Latency | Request Time | End-to-end inference request 
handling time 
 | 
Per model | Per request | 
| Compute Time | Time a request spends executing 
the inference model (in the 
framework backend) 
 | 
Per model | Per request | |
| Queue Time | Time a request spends waiting 
in the queue 
 | 
Per model | Per request |