Observability

NIM provides Prometheus metrics indicating request statistics. These metrics can be used to create dashboards with the Grafana dashboard. By default, these metrics are available at http://localhost:8000/metrics.

curl -X 'GET' 'http://0.0.0.0:8000/metrics'

The following table describes the available metrics.

Category

Metric

Metric Name

Description

Granularity

Frequency

KV Cache

Count

GPU Cache Usage

gpu_cache_usage_perc

GPU KV-cache usage. 1 means 100 percent usage

Per model

Per iteration

Running Count

num_requests_running

Number of requests currently running on GPU

Per model

Per iteration

Waiting Count

num_requests_waiting

Number of requests waiting to be processed

Per model

Per iteration

Max Request Count

num_request_max

Max number of concurrently running requests

Per model

Per iteration

Total Prompt Token Count

prompt_tokens_total

Number of prefill tokens processed

Per model

Per iteration

Total Generation Token Count

generation_tokens_total

Number of generation tokens processed

Per model

Per iteration

Latency

Time to First Token

time_to_first_token_seconds

Histogram of time to first token in seconds

Per model

Per request

Time per Output Token

time_per_output_token_seconds

Histogram of time per output token in seconds

Per model

Per request

End to End Request Latency

e2e_request_latency_seconds

Histogram of end to end request latency in seconds

Per model

Per request

Vision Encoder Latency

vision_encoder_latency_seconds

Histogram of vision encoder latency in seconds

Per model

Per request

Count

Prompt Token Count

request_prompt_tokens

Histogram of number of prefill tokens processed

Per model

Per request

Generation Token Count

request_generation_tokens

Histogram of number of generation tokens processed

Per model

Per request

Finished Request Count

request_finish_total

Number of finished requests, with label indicating finish reason

Per model

Per request

Successful Request Count

request_success_total

Number of successful requests

Per model

Per request

Successful Request Count

request_failure_total

Number of failed requests

Per model

Per request

Image Count

request_image_count

Histogram of the number of images per request

Per model

Per request

Image

Image size

image_size_pixels

Histogram of image sizes in pixels

Per model

Per request

Prometheus

To install Prometheus for scraping metrics from NIM, download the latest Prometheus version appropriate for your system.

wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
tar -xvzf prometheus-2.52.0.linux-amd64.tar.gz
cd prometheus-2.52.0.linux-amd64/

Edit the Prometheus configuration file prometheus.yml to scrape from the NIM endpoint. Make sure the targets field point to localhost:8000.

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:8000"]

Run the Prometheus server

./prometheus --config.file=./prometheus.yml

Open a browser and point it to http://localhost:9090/targets?search= to check that the NIM target was detected by the Prometheus server. You can also click on the NIM target url link to explore the generated metrics.

Grafana

We can use Grafana for dashboarding NIM metrics. Install the latest Grafana version appropriate for your system.

wget https://dl.grafana.com/oss/release/grafana-11.0.0.linux-amd64.tar.gz
tar -zxvf grafana-11.0.0.linux-amd64.tar.gz
cd grafana-v11.0.0/

Run the Grafana server

./bin/grafana-server

To access the Grafana dashboard, point your browser to http://localhost:3000. You will need to log in using the defaults.

username: admin
password: admin

The first step is to configure the source from which Grafana can scrape metrics. Click on the “Data Source” button, select Prometheus, and specify the Prometheus URL localhost:9090. After saving the configuration, you should see a success message.

Now you are ready to create a dashboard with metrics from NIM. This Grafana tutorial provides step-by-step instructions on building a dashboard from scratch. Alternatively, you can jump start using this example dashboard.

NIM Dashboard Example

Vision metrics

Vision-related metrics (e.g., Vision Encoder Latency) are only available for the TRT-LLM backend (the vLLM backend only collects end-to-end metrics). Consider starting with this example dashboard instead if NIM runs TRT-LLM optimized engines.