Observability#

NIM provides Prometheus metrics indicating request statistics. These metrics can be used to create dashboards with the Grafana dashboard. By default, these metrics are available at http://0.0.0.0:9000/v1/metrics .

You can use the following command to retrieve the metrics:

curl -X 'GET' 'http://0.0.0.0:9000/v1/metrics'

The following table describes the available metrics

Category

Metric Name

Description

GPU

gpu_power_usage_watts

GPU instantaneous power, in watts

gpu_power_limit_watts

Maximum GPU power limit, in watts

gpu_total_energy_consumption_joules

GPU total energy consumption, in joules

gpu_utilization

GPU utilization rate (0.0 - 1.0)

gpu_memory_total_bytes

Total GPU memory, in bytes

gpu_memory_used_bytes

Used GPU memory, in bytes

Process

process_virtual_memory_bytes

Virtual memory size in bytes

process_resident_memory_bytes

Resident memory size in bytes

process_start_time_seconds

Start time of the process since Unix epoch in seconds

process_cpu_seconds_total

Total user and system CPU time spent in seconds

process_open_fds

Number of open file descriptors

process_max_fds

Maximum number of open file descriptors

Python

python_gc_objects_collected_total

Objects collected during GC

python_gc_objects_uncollectable_total

Uncollectable objects found during GC

python_gc_collections_total

Number of times this generation was collected

python_info

Python platform information