Observability#

NVIDIA NIM for Visual Generative AI (Visual GenAI NIM) supports exporting Triton metrics through a Prometheus endpoint.

Triton Metrics#

Triton exposes its metrics on port 8002 in Prometheus format.

The following table describes the available metrics at http://localhost:8002/metrics.

Category

Metric

Metric Name

Description

Count

Success Request Count

nv_inference_request_success

Number of successful inference requests

Count

Failure Request Count

nv_inference_request_failure

Number of failed inference requests

Count

Total Request Count

nv_inference_count

Number of inferences performed

Count

Request Duration

nv_inference_request_duration_us

Cumulative inference request duration in microseconds

Count

Queue Duration

nv_inference_queue_duration_us

Cumulative inference queuing duration in microseconds

Count

Inference Duration

nv_inference_compute_infer_duration_us

Cumulative compute inference duration in microseconds

Gauge

GPU Utilization

nv_gpu_utilization

GPU utilization rate [0.0 - 1.0)

Gauge

Total GPU Memory

nv_gpu_memory_total_bytes

GPU total memory

Gauge

Used GPU Memory

nv_gpu_memory_used_bytes

GPU used memory

Gauge

CPU Utilization

nv_cpu_utilization

CPU utilization rate [0.0 - 1.0]

Gauge

Total CPU Memory

nv_cpu_memory_total_bytes

CPU total memory (RAM)

Gauge

Used CPU Memory

nv_cpu_memory_used_bytes

CPU used memory (RAM)

Prometheus#

To install Prometheus for scraping metrics from NIM, download the latest Prometheus version appropriate for your system.

wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
tar -xvzf prometheus-2.52.0.linux-amd64.tar.gz
cd prometheus-2.52.0.linux-amd64/

Edit the Prometheus configuration file to scrape from the NIM endpoint. Make sure the targets field point to localhost:8002

vim prometheus.yml

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:8002"]

Next run Prometheus server ./prometheus --config.file=./prometheus.yml

Use a browser to check that the NIM target was detected by Prometheus server http://localhost:9090/targets?search=. You can also click on the NIM target URL link to explore generated metrics.

Grafana#

We can use Grafana for dashboarding NIM metrics. Install the latest Grafana version appropriate for your system.

wget https://dl.grafana.com/oss/release/grafana-11.0.0.linux-amd64.tar.gz
tar -zxvf grafana-11.0.0.linux-amd64.tar.gz

Run the Grafana server

cd grafana-v11.0.0/
./bin/grafana-server

To access the Grafana dashboard point your browser to http://localhost:3000. You will need to login using the defaults

username: admin 
password: admin

The first step is to configure the source for Grafana to scrape metrics from. Click on the Data Source button, select Prometheus and specify the Prometheus URL localhost:9090. After saving the configuration you should see a success message, now you are ready to create a dashboard with metrics from NIM.