Observability#
NIM provides Prometheus metrics indicating request statistics. These metrics can be used to create dashboards with the Grafana dashboard. By default, these metrics are available at http://0.0.0.0:9000/v1/metrics
.
You can use the following command to retrieve the metrics:
curl -X 'GET' 'http://0.0.0.0:9000/v1/metrics'
The following table describes the available metrics
Category |
Metric Name |
Description |
---|---|---|
GPU |
gpu_power_usage_watts |
GPU instantaneous power, in watts |
gpu_power_limit_watts |
Maximum GPU power limit, in watts |
|
gpu_total_energy_consumption_joules |
GPU total energy consumption, in joules |
|
gpu_utilization |
GPU utilization rate (0.0 - 1.0) |
|
gpu_memory_total_bytes |
Total GPU memory, in bytes |
|
gpu_memory_used_bytes |
Used GPU memory, in bytes |
|
Process |
process_virtual_memory_bytes |
Virtual memory size in bytes |
process_resident_memory_bytes |
Resident memory size in bytes |
|
process_start_time_seconds |
Start time of the process since Unix epoch in seconds |
|
process_cpu_seconds_total |
Total user and system CPU time spent in seconds |
|
process_open_fds |
Number of open file descriptors |
|
process_max_fds |
Maximum number of open file descriptors |
|
Python |
python_gc_objects_collected_total |
Objects collected during GC |
python_gc_objects_uncollectable_total |
Uncollectable objects found during GC |
|
python_gc_collections_total |
Number of times this generation was collected |
|
python_info |
Python platform information |