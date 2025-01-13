Observability#
NIM provides Prometheus metrics indicating request statistics. The metrics can be used for creating dashboards with Grafana dashboard. By default, these metrics are available at
http://localhost:8000/metrics.
The following table describes the available metrics.
|
Category
|
Metric
|
Metric Name
|
Description
|
Granularity
|
Frequency
|
KV Cache
|
GPU Cache Usage
|
|
GPU KV-cache usage. 1 means 100 percent usage
|
Per model
|
Per iteration
|
Count
|
Running Count
|
|
Number of requests currently running on GPU
|
Per model
|
Per iteration
|
Waiting Count
|
|
Number of requests waiting to be processed
|
Per model
|
Per iteration
|
Max Request Count
|
|
Max number of concurrently running requests
|
Per model
|
Per iteration
|
Total Prompt Token Count
|
|
Number of prefill tokens processed
|
Per model
|
Per iteration
|
Total Generation Token Count
|
|
Number of generation tokens processed
|
Per model
|
Per iteration
|
Latency
|
Time to First Token
|
|
Histogram of time to first token in seconds
|
Per model
|
Per request
|
Time per Output Token
|
|
Histogram of time per output token in seconds
|
Per model
|
Per request
|
End to End
|
|
Histogram of end to end request latency in seconds
|
Per model
|
Per request
|
Count
|
Prompt Token Count
|
|
Histogram of number of prefill tokens processed
|
Per model
|
Per request
|
Generation Token Count
|
|
Histogram of number of generation tokens processed
|
Per model
|
Per request
|
Finished Request Count
|
|
Number of finished requests, with label indicating finish reason
|
Per model
|
Per request
Prometheus#
To install Prometheus for scraping metrics from NIM, download the latest Prometheus version appropriate for your system.
wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
tar -xvzf prometheus-2.52.0.linux-amd64.tar.gz
cd prometheus-2.52.0.linux-amd64/
Edit the Prometheus configuration file to scrape from the NIM endpoint. Make sure the
targets field point to
localhost:8000
vi prometheus.yml
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:8000"]
Next run Prometheus server
./prometheus --config.file=./prometheus.yml
Use a browser to check that the NIM target was detected by Prometheus server
http://localhost:9090/targets?search=.
You can also click on the NIM target URL link to explore generated metrics.
Grafana#
We can use Grafana for dashboarding NIM metrics. Install the latest Grafana version appropriate for your system.
wget https://dl.grafana.com/oss/release/grafana-11.0.0.linux-amd64.tar.gz
tar -zxvf grafana-11.0.0.linux-amd64.tar.gz
Run the Grafana server
cd grafana-v11.0.0/
./bin/grafana-server
To access the Grafana dashboard point your browser to
http://localhost:3000. You will need to login using the defaults
username: admin
password: admin
The first step is to configure the source for Grafana to scrape metrics from. Click on the “Data Source” button, select Prometheus and specify the Prometheus URL
localhost:9090. After saving the configuration you should see a success message, now you are ready to create a dashboard with metrics from NIM or you can try this example dashboard.