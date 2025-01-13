NIM provides Prometheus metrics indicating request statistics. The metrics can be used for creating dashboards with Grafana dashboard. By default, these metrics are available at http://localhost:8000/metrics .

The following table describes the available metrics.

Category Metric Metric Name Description Granularity Frequency KV Cache GPU Cache Usage gpu_cache_usage_perc GPU KV-cache usage. 1 means 100 percent usage Per model Per iteration Count Running Count num_requests_running Number of requests currently running on GPU Per model Per iteration Waiting Count num_requests_waiting Number of requests waiting to be processed Per model Per iteration Max Request Count num_request_max Max number of concurrently running requests Per model Per iteration Total Prompt Token Count prompt_tokens_total Number of prefill tokens processed Per model Per iteration Total Generation Token Count generation_tokens_total Number of generation tokens processed Per model Per iteration Latency Time to First Token time_to_first_token_seconds Histogram of time to first token in seconds Per model Per request Time per Output Token time_per_output_token_seconds Histogram of time per output token in seconds Per model Per request End to End e2e_request_latency_seconds Histogram of end to end request latency in seconds Per model Per request Count Prompt Token Count request_prompt_tokens Histogram of number of prefill tokens processed Per model Per request Generation Token Count request_generation_tokens Histogram of number of generation tokens processed Per model Per request Finished Request Count request_success_total Number of finished requests, with label indicating finish reason Per model Per request

Prometheus# To install Prometheus for scraping metrics from NIM, download the latest Prometheus version appropriate for your system. wget https : //github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz tar - xvzf prometheus -2.52.0 . linux - amd64 . tar . gz cd prometheus -2.52.0 . linux - amd64 / Edit the Prometheus configuration file to scrape from the NIM endpoint. Make sure the targets field point to localhost:8000 vi prometheus.yml # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs : # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name : "prometheus" # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs : - targets : [ "localhost:8000" ] Next run Prometheus server ./prometheus --config.file=./prometheus.yml Use a browser to check that the NIM target was detected by Prometheus server http://localhost:9090/targets?search= . You can also click on the NIM target URL link to explore generated metrics.