Observability
NIM provides Prometheus metrics indicating request statistics. The metrics can be used for creating dashboards with Grafana dashboard. By default, these metrics are available at http://localhost:8000/metrics
.
The following table describes the available metrics.
Category |
Metric |
Metric Name |
Description |
Granularity |
Frequency |
---|---|---|---|---|---|
KV Cache | GPU Cache Usage | gpu_cache_usage_perc |
GPU KV-cache usage. 1 means 100 percent usage | Per model | Per iteration |
Count | Running Count | num_requests_running |
Number of requests currently running on GPU | Per model | Per iteration |
Waiting Count | num_requests_waiting |
Number of requests waiting to be processed | Per model | Per iteration | |
Max Request Count | num_request_max |
Max number of concurrently running requests | Per model | Per iteration | |
Total Prompt Token Count | prompt_tokens_total |
Number of prefill tokens processed | Per model | Per iteration | |
Total Generation Token Count | generation_tokens_total |
Number of generation tokens processed | Per model | Per iteration | |
Latency | Time to First Token | time_to_first_token_seconds |
Histogram of time to first token in seconds | Per model | Per request |
Time per Output Token | time_per_output_token_seconds |
Histogram of time per output token in seconds | Per model | Per request | |
End to End | e2e_request_latency_seconds |
Histogram of end to end request latency in seconds | Per model | Per request | |
Count | Prompt Token Count | request_prompt_tokens |
Histogram of number of prefill tokens processed | Per model | Per request |
Generation Token Count | request_generation_tokens |
Histogram of number of generation tokens processed | Per model | Per request | |
Finished Request Count | request_success_total |
Number of finished requests, with label indicating finish reason | Per model | Per request |
To install Prometheus for scraping metrics from NIM, download the latest Prometheus version appropriate for your system.
wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
tar -xvzf prometheus-2.52.0.linux-amd64.tar.gz
cd prometheus-2.52.0.linux-amd64/
Edit the Prometheus configuration file to scrape from the NIM endpoint. Make sure the targets
field point to localhost:8000
vi prometheus.yml
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:8000"]
Next run Prometheus server
./prometheus --config.file=./prometheus.yml
Use a browser to check that the NIM target was detected by Prometheus server http://localhost:9090/targets?search=
.
You can also click on the NIM target URL link to explore generated metrics.
We can use Grafana for creating dashboards for NIM metrics. Install the latest Grafana version appropriate for your system.
wget https://dl.grafana.com/oss/release/grafana-11.0.0.linux-amd64.tar.gz
tar -zxvf grafana-11.0.0.linux-amd64.tar.gz
Run the Grafana server
cd grafana-v11.0.0/
./bin/grafana-server
To access the Grafana dashboard point your browser to http://localhost:3000
. You will need to login using the defaults
username: admin
password: admin
The first step is to configure the source for Grafana to scrape metrics from. Click on the “Data Source” button, select Prometheus and specify the Prometheus URL localhost:9090
. After saving the configuration you should see a success message, now you are ready to create a dashboard with metrics from NIM or you can try this example dashboard.