Observability for Nemotron Safety Guard NIM#
NIM provides Prometheus metrics indicating request statistics. You can use these metrics to create dashboards with Grafana. By default, these metrics are available at http://localhost:8000/v1/metrics.
The following table describes the available metrics.
Category  | 
Metric  | 
Metric Name  | 
Description  | 
Granularity  | 
Frequency  | 
|---|---|---|---|---|---|
KV Cache  | 
GPU Cache Usage  | 
  | 
GPU KV-cache usage. 1 means 100 percent usage  | 
Per model  | 
Per iteration  | 
Count  | 
Running Count  | 
  | 
Number of requests currently running on GPU  | 
Per model  | 
Per iteration  | 
Waiting Count  | 
  | 
Number of requests waiting to be processed  | 
Per model  | 
Per iteration  | 
|
Max Request Count  | 
  | 
Max number of requests that can be run concurrently by the model  | 
Per model  | 
Per iteration  | 
|
Total Prompt Token Count  | 
  | 
Number of prefill tokens processed  | 
Per model  | 
Per iteration  | 
|
Total Generation Token Count  | 
  | 
Number of generation tokens processed  | 
Per model  | 
Per iteration  | 
|
Latency  | 
Time to First Token  | 
  | 
Histogram of time to first token in seconds  | 
Per model  | 
Per request  | 
Time per Output Token  | 
  | 
Histogram of time per output token in seconds  | 
Per model  | 
Per request  | 
|
End to End  | 
  | 
Histogram of end to end request latency in seconds  | 
Per model  | 
Per request  | 
|
Count  | 
Prompt Token Count  | 
  | 
Histogram of number of prefill tokens processed  | 
Per model  | 
Per request  | 
Generation Token Count  | 
  | 
Histogram of number of generation tokens processed  | 
Per model  | 
Per request  | 
|
Finished Request Count  | 
  | 
Number of finished requests, with label indicating finish reason  | 
Per model  | 
Per request  | 
|
Success Request Count  | 
  | 
Number of successful requests, requests with finish reason “stop” or “length” are counted  | 
Per model  | 
Per request  | 
|
Failure Request Count  | 
  | 
Number of failed requests, requests with other finish reason are counted  | 
Per model  | 
Per request  | 
Prometheus#
To install Prometheus for scraping metrics from NIM, download the latest Prometheus version appropriate for your system.
$ wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
$ tar -xvzf prometheus-2.52.0.linux-amd64.tar.gz
$ cd prometheus-2.52.0.linux-amd64/
Edit the Prometheus configuration file, prometheus.yml, to scrape from the NIM endpoint.
Make sure the targets field is set to localhost:8000.
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"
    # In Prometheus 2.53, metrics_path defaults to '/metrics'
    # Previous versions use '/v1/metrics'.
    # scheme defaults to 'http'.
    static_configs:
      - targets: ["localhost:8000"]
Afterward, run the Prometheus server:
$ ./prometheus --config.file=./prometheus.yml
Use a browser to verify that the NIM target was detected by the Prometheus server: http://localhost:9090/targets?search=. You can also click on the NIM target URL link to explore generated metrics.
Grafana#
You can use Grafana for visualizing NIM metrics. Install the latest Grafana version for your system.
$ wget https://dl.grafana.com/oss/release/grafana-11.0.0.linux-amd64.tar.gz
$ tar -zxvf grafana-11.0.0.linux-amd64.tar.gz
Start the Grafana server:
$ cd grafana-v11.0.0/
$ ./bin/grafana-server
To access the Grafana dashboard, go to http://localhost:3000. Log in using the defaults:
username: admin
password: admin
First, configure the data source in Grafana.
Click Data Source, select Prometheus, and set the URL to localhost:9090.
After saving the configuration, you should see a success message.