Observability - NVIDIA Docs

NIM provides Prometheus metrics indicating request statistics. The metrics can be used for creating dashboards with Grafana dashboard. By default, these metrics are available at http://localhost:8000/metrics.

The following table describes the available metrics.

Category	Metric	Metric Name	Description	Granularity	Frequency
KV Cache	GPU Cache Usage	`gpu_cache_usage_perc`	GPU KV-cache usage. 1 means 100 percent usage	Per model	Per iteration
Count	Running Count	`num_requests_running`	Number of requests currently running on GPU	Per model	Per iteration
	Waiting Count	`num_requests_waiting`	Number of requests waiting to be processed	Per model	Per iteration
	Max Request Count	`num_request_max`	Max number of concurrently running requests	Per model	Per iteration
	Total Prompt Token Count	`prompt_tokens_total`	Number of prefill tokens processed	Per model	Per iteration
	Total Generation Token Count	`generation_tokens_total`	Number of generation tokens processed	Per model	Per iteration
Latency	Time to First Token	`time_to_first_token_seconds`	Histogram of time to first token in seconds	Per model	Per request
	Time per Output Token	`time_per_output_token_seconds`	Histogram of time per output token in seconds	Per model	Per request
	End to End	`e2e_request_latency_seconds`	Histogram of end to end request latency in seconds	Per model	Per request
Count	Prompt Token Count	`request_prompt_tokens`	Histogram of number of prefill tokens processed	Per model	Per request
	Generation Token Count	`request_generation_tokens`	Histogram of number of generation tokens processed	Per model	Per request
	Finished Request Count	`request_success_total`	Number of finished requests, with label indicating finish reason	Per model	Per request

Prometheus

To install Prometheus for scraping metrics from NIM, download the latest Prometheus version appropriate for your system.

Copy
Copied!

            
            wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
tar -xvzf prometheus-2.52.0.linux-amd64.tar.gz
cd prometheus-2.52.0.linux-amd64/

Edit the Prometheus configuration file to scrape from the NIM endpoint. Make sure the targets field point to localhost:8000

vi prometheus.yml

Copy
Copied!

            
            # A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:8000"]

Next run Prometheus server ./prometheus --config.file=./prometheus.yml

Use a browser to check that the NIM target was detected by Prometheus server http://localhost:9090/targets?search=. You can also click on the NIM target url link to explore generated metrics.

Grafana

We can use Grafana for dashborading NIM metrics. Install the latest Grafana version appropriate for your system.

Copy
Copied!

            
            wget https://dl.grafana.com/oss/release/grafana-11.0.0.linux-amd64.tar.gz
tar -zxvf grafana-11.0.0.linux-amd64.tar.gz

Run the Grafana server

Copy
Copied!

            
            cd grafana-v11.0.0/
./bin/grafana-server

To access the Grafana dashboard point your browser to http://localhost:3000. You will need to login using the defaults

Copy
Copied!

            
            username: admin 
password: admin

The first step is to congfigure the source for Grafana to scrape metrics from. Click on the “Data Source” button, select Prometheus and specify the Prometheus url localhost:9090. After saving the configuration you should see a success message, now you are ready to create a dashboard with metrics from NIM or you can try this example dashboard.