Is this page helpful?

Observability for Nemotron Safety Guard NIM#

NIM provides Prometheus metrics indicating request statistics. You can use these metrics to create dashboards with Grafana. By default, these metrics are available at http://localhost:8000/v1/metrics.

The following table describes the available metrics.

Category	Metric	Metric Name	Description	Granularity	Frequency
KV Cache	GPU Cache Usage	`gpu_cache_usage_perc`	GPU KV-cache usage. 1 means 100 percent usage	Per model	Per iteration
Count	Running Count	`num_requests_running`	Number of requests currently running on GPU	Per model	Per iteration
	Waiting Count	`num_requests_waiting`	Number of requests waiting to be processed	Per model	Per iteration
	Max Request Count	`num_request_max`	Max number of requests that can be run concurrently by the model	Per model	Per iteration
	Total Prompt Token Count	`prompt_tokens_total`	Number of prefill tokens processed	Per model	Per iteration
	Total Generation Token Count	`generation_tokens_total`	Number of generation tokens processed	Per model	Per iteration
Latency	Time to First Token	`time_to_first_token_seconds`	Histogram of time to first token in seconds	Per model	Per request
	Time per Output Token	`time_per_output_token_seconds`	Histogram of time per output token in seconds	Per model	Per request
	End to End	`e2e_request_latency_seconds`	Histogram of end to end request latency in seconds	Per model	Per request
Count	Prompt Token Count	`request_prompt_tokens`	Histogram of number of prefill tokens processed	Per model	Per request
	Generation Token Count	`request_generation_tokens`	Histogram of number of generation tokens processed	Per model	Per request
	Finished Request Count	`request_finish_total`	Number of finished requests, with label indicating finish reason	Per model	Per request
	Success Request Count	`request_success_total`	Number of successful requests, requests with finish reason “stop” or “length” are counted	Per model	Per request
	Failure Request Count	`request_failure_total`	Number of failed requests, requests with other finish reason are counted	Per model	Per request

Prometheus#

To install Prometheus for scraping metrics from NIM, download the latest Prometheus version appropriate for your system.

$ wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
$ tar -xvzf prometheus-2.52.0.linux-amd64.tar.gz
$ cd prometheus-2.52.0.linux-amd64/

Edit the Prometheus configuration file, prometheus.yml, to scrape from the NIM endpoint. Make sure the targets field is set to localhost:8000.

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # In Prometheus 2.53, metrics_path defaults to '/metrics'
    # Previous versions use '/v1/metrics'.
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:8000"]

Afterward, run the Prometheus server:

$ ./prometheus --config.file=./prometheus.yml

Use a browser to verify that the NIM target was detected by the Prometheus server: http://localhost:9090/targets?search=. You can also click on the NIM target URL link to explore generated metrics.

Grafana#

You can use Grafana for visualizing NIM metrics. Install the latest Grafana version for your system.

$ wget https://dl.grafana.com/oss/release/grafana-11.0.0.linux-amd64.tar.gz
$ tar -zxvf grafana-11.0.0.linux-amd64.tar.gz

Start the Grafana server:

$ cd grafana-v11.0.0/
$ ./bin/grafana-server

To access the Grafana dashboard, go to http://localhost:3000. Log in using the defaults:

username: admin
password: admin

First, configure the data source in Grafana. Click Data Source, select Prometheus, and set the URL to localhost:9090. After saving the configuration, you should see a success message.