Is this page helpful?

Observability#

NVIDIA NIM for Cosmos WFM (World Foundation Models) provides observability features that allow you to monitor system performance and resource usage. This section covers how to access metrics and set up monitoring tools.

Metrics Endpoint#

NIM exposes Prometheus metrics for request statistics and system performance. These metrics can be used to create dashboards in monitoring tools like Grafana.

By default, metrics are available at the following endpoint:

curl -X 'GET' 'http://0.0.0.0:8000/v1/metrics'

Available Metrics#

The following table describes the system metrics available through the metrics endpoint:

Category	Metric	Metric Name	Description
Python	GC Objects Collected	python_gc_objects_collected	Number of objects collected during garbage collection
Python	GC Objects Uncollectable	python_gc_objects_uncollectable	Number of objects that could not be collected during garbage collection
Python	GC Collections	python_gc_collections_total	Number of objects collected by the garbage collector
Process	Virtual Memory	process_virtual_memory_bytes	Virtual memory size used for the process
Process	Resident Memory	process_resident_memory_bytes	Physical memory size used for the process
Process	CPU Time	process_cpu_seconds_total	Total CPU time used for the process
GPU	Power Usage	gpu_power_usage_watts	Current power consumption of the GPU
GPU	Power Limit	gpu_power_limit_watts	Maximum power limit configured for the GPU
GPU	Energy Consumption	gpu_total_energy_consumption	GPU energy consumption
GPU	GPU Utilization	gpu_utilization	GPU compute utilization percentage
GPU	Memory Total	gpu_memory_total_bytes	Total memory available on the GPU
GPU	Memory Used	gpu_memory_used_bytes	Memory currently in use on the GPU

Note

For more detailed inference-level metrics, you can access the Triton metrics endpoint at http://0.0.0.0:8002/metrics. For more information on these metrics, refer to the Triton Metrics documentation.

Setting Up Monitoring#

This section provides instructions for setting up Prometheus and Grafana to monitor your NIM for Cosmos WFM deployment. Follow these steps to install and configure Prometheus for scraping metrics from NIM:

Download the latest Prometheus version for your system:

wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
tar -xvzf prometheus-2.52.0.linux-amd64.tar.gz
cd prometheus-2.52.0.linux-amd64/

Configure Prometheus to scrape metrics from the NIM endpoint by editing the prometheus.yml file:

# A scrape configuration containing exactly one endpoint to scrape
scrape_configs:
  - job_name: "nim-metrics"
    static_configs:
      - targets: ["localhost:8000"]

Start the Prometheus server:

./prometheus --config.file=./prometheus.yml

Verify the setup by opening a web browser and navigating to http://localhost:9090/targets. You should see the NIM target listed with a status of “UP”.

Grafana Setup#

Follow these steps to set up Grafana for visualizing NIM metrics:

Download and install the latest Grafana version for your system:

wget https://dl.grafana.com/oss/release/grafana-11.0.0.linux-amd64.tar.gz
tar -zxvf grafana-11.0.0.linux-amd64.tar.gz
cd grafana-v11.0.0/

Start the Grafana server:
```
./bin/grafana-server
```
Access the Grafana web interface by opening a browser and navigating to http://localhost:3000. Log in using the default credentials:
```
Username: admin
Password: admin
```
Configure Prometheus as a data source:
1. Navigate to Connections > Data sources in the Grafana sidebar.
2. Click Add data source and select “Prometheus”.
3. Set the URL to http://localhost:9090.
4. Click Save & test to verify the connection.

Creating Dashboards#

Once you have Prometheus and Grafana set up, you can create dashboards to visualize NIM metrics:

In Grafana, click on “Dashboards” in the sidebar and then New > New Dashboard.
Click Add visualization.
Select your Prometheus data source.
Use the query builder to select metrics such as gpu_utilization, gpu_memory_used_bytes, or process_cpu_seconds_total.
Configure the visualization settings and add the panel to your dashboard.
Repeat the above steps for additional metrics you want to monitor.

For more detailed instructions on building Grafana dashboards, refer to the Grafana Fundamentals tutorial.

Tip

Refer to the troubleshooting page if you are encountering issues with metrics collection or visualization.

Observability extras for `Cosmos3-Generator`#

In addition to the metrics endpoint described above, Cosmos3-Generator exposes the following HTTP endpoints for inspection and health-checking:

Method	Path	Description
`GET`	`/v1/health/live`	Liveness probe. Returns 200 once the HTTP layer is up; does not wait for warmup.
`GET`	`/v1/health/ready`	Readiness probe. Returns 200 only after first-run engine compilation and warmup complete.
`GET`	`/v1/metrics`	Prometheus metrics (same endpoint described in the section above).
`GET`	`/v1/metadata`	Selected profile, `checkpoint` field (default profile reference, or the `NIM_FT_CHECKPOINT` path under Bring your own checkpoint for Cosmos3-Generator).
`GET`	`/v1/models`	OpenAI-compatible model list. Returns the model currently being served.
`GET`	`/v1/manifest`	Contents of the in-container model manifest. Useful for confirming which profile and artifact bundle were resolved at boot time.
`GET`	`/v1/version`	Product release version and server API version.
`GET`	`/v1/license`	Bundled license info.

Logging knobs#

NIM_LOG_LEVEL (default INFO) — Python logger level for the NIM service. Set to DEBUG for verbose request-handling logs.
TLLM_LOG_LEVEL (default ERROR) — TRT-LLM engine logger. Set to INFO to see engine load progress, autotune events, and execution traces. Independent of NIM_LOG_LEVEL.