Observability#

NVIDIA NIM for Cosmos WFM (World Foundation Models) provides observability features that allow you to monitor system performance and resource usage. This section covers how to access metrics and set up monitoring tools.

Metrics Endpoint#

NIM exposes Prometheus metrics for request statistics and system performance. These metrics can be used to create dashboards in monitoring tools like Grafana.

By default, metrics are available at the following endpoint:

curl -X 'GET' 'http://0.0.0.0:8000/v1/metrics'

Available Metrics#

The following table describes the system metrics available through the metrics endpoint:

Category

Metric

Metric Name

Description

Python

GC Objects Collected

python_gc_objects_collected

Number of objects collected during garbage collection

Python

GC Objects Uncollectable

python_gc_objects_uncollectable

Number of objects that could not be collected during garbage collection

Python

GC Collections

python_gc_collections_total

Number of objects collected by the garbage collector

Process

Virtual Memory

process_virtual_memory_bytes

Virtual memory size used for the process

Process

Resident Memory

process_resident_memory_bytes

Physical memory size used for the process

Process

CPU Time

process_cpu_seconds_total

Total CPU time used for the process

GPU

Power Usage

gpu_power_usage_watts

Current power consumption of the GPU

GPU

Power Limit

gpu_power_limit_watts

Maximum power limit configured for the GPU

GPU

Energy Consumption

gpu_total_energy_consumption

GPU energy consumption

GPU

GPU Utilization

gpu_utilization

GPU compute utilization percentage

GPU

Memory Total

gpu_memory_total_bytes

Total memory available on the GPU

GPU

Memory Used

gpu_memory_used_bytes

Memory currently in use on the GPU

Note

For more detailed inference-level metrics, you can access the Triton metrics endpoint at http://0.0.0.0:8002/metrics. For more information on these metrics, refer to the Triton Metrics documentation.

Setting Up Monitoring#

This section provides instructions for setting up Prometheus and Grafana to monitor your NIM for Cosmos WFM deployment. Follow these steps to install and configure Prometheus for scraping metrics from NIM:

  1. Download the latest Prometheus version for your system:

    wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
    tar -xvzf prometheus-2.52.0.linux-amd64.tar.gz
    cd prometheus-2.52.0.linux-amd64/
    
  2. Configure Prometheus to scrape metrics from the NIM endpoint by editing the prometheus.yml file:

    # A scrape configuration containing exactly one endpoint to scrape
    scrape_configs:
      - job_name: "nim-metrics"
        static_configs:
          - targets: ["localhost:8000"]
    
  3. Start the Prometheus server:

    ./prometheus --config.file=./prometheus.yml
    
  4. Verify the setup by opening a web browser and navigating to http://localhost:9090/targets. You should see the NIM target listed with a status of “UP”.

Grafana Setup#

Follow these steps to set up Grafana for visualizing NIM metrics:

  1. Download and install the latest Grafana version for your system:

    wget https://dl.grafana.com/oss/release/grafana-11.0.0.linux-amd64.tar.gz
    tar -zxvf grafana-11.0.0.linux-amd64.tar.gz
    cd grafana-v11.0.0/
    
  2. Start the Grafana server:

    ./bin/grafana-server
    
  3. Access the Grafana web interface by opening a browser and navigating to http://localhost:3000. Log in using the default credentials:

    Username: admin
    Password: admin
    
  4. Configure Prometheus as a data source:

    1. Navigate to Connections > Data sources in the Grafana sidebar.

    2. Click Add data source and select “Prometheus”.

    3. Set the URL to http://localhost:9090.

    4. Click Save & test to verify the connection.

Creating Dashboards#

Once you have Prometheus and Grafana set up, you can create dashboards to visualize NIM metrics:

  1. In Grafana, click on “Dashboards” in the sidebar and then New > New Dashboard.

  2. Click Add visualization.

  3. Select your Prometheus data source.

  4. Use the query builder to select metrics such as gpu_utilization, gpu_memory_used_bytes, or process_cpu_seconds_total.

  5. Configure the visualization settings and add the panel to your dashboard.

  6. Repeat the above steps for additional metrics you want to monitor.

For more detailed instructions on building Grafana dashboards, refer to the Grafana Fundamentals tutorial.

Tip

Refer to the troubleshooting page if you are encountering issues with metrics collection or visualization.

Observability extras for Cosmos3-Generator#

In addition to the metrics endpoint described above, Cosmos3-Generator exposes the following HTTP endpoints for inspection and health-checking:

Method

Path

Description

GET

/v1/health/live

Liveness probe. Returns 200 once the HTTP layer is up; does not wait for warmup.

GET

/v1/health/ready

Readiness probe. Returns 200 only after first-run engine compilation and warmup complete.

GET

/v1/metrics

Prometheus metrics (same endpoint described in the section above).

GET

/v1/metadata

Selected profile, checkpoint field (default profile reference, or the NIM_FT_CHECKPOINT path under Bring your own checkpoint for Cosmos3-Generator).

GET

/v1/models

OpenAI-compatible model list. Returns the model currently being served.

GET

/v1/manifest

Contents of the in-container model manifest. Useful for confirming which profile and artifact bundle were resolved at boot time.

GET

/v1/version

Product release version and server API version.

GET

/v1/license

Bundled license info.

Logging knobs#

  • NIM_LOG_LEVEL (default INFO) — Python logger level for the NIM service. Set to DEBUG for verbose request-handling logs.

  • TLLM_LOG_LEVEL (default ERROR) — TRT-LLM engine logger. Set to INFO to see engine load progress, autotune events, and execution traces. Independent of NIM_LOG_LEVEL.