Observability#
This guide explains how to monitor the health, queue depth, and throughput of the NIM for BGR. You can access operational metrics directly through the built-in status endpoint, or integrate with Prometheus and Grafana for continuous monitoring.
BGR-Specific Metrics#
NIM for BGR exposes operational metrics using the GET /v1/status
endpoint. These metrics provide visibility into the batched geometry relaxation
workload.
Aggregate Metrics#
Aggregate metrics track the combined geometry relaxation tasks, API requests, and overall queue depths across all active GPU workers:
Metric |
Type |
Description |
|---|---|---|
|
counter |
Total geometry relaxation tasks received |
|
counter |
Total tasks completed |
|
gauge |
Current number of atoms queued for processing |
|
gauge |
Current number of tasks waiting in the queue |
|
counter |
Total API requests received |
|
counter |
Total API requests completed |
|
gauge |
API requests currently being processed |
|
gauge |
Maximum atoms per batch for the current GPU configuration |
Per-Worker Metrics#
Each GPU worker reports its own metrics from the following:
Metric |
Type |
Description |
|---|---|---|
|
string |
GPU device identifier (for example, |
|
counter |
Tasks received by this worker |
|
counter |
Tasks completed by this worker |
|
gauge |
Atoms queued on this worker |
|
gauge |
Tasks queued on this worker |
|
gauge |
Current batch size |
|
gauge |
Maximum batch size for this GPU |
|
string |
GPU model name |
Query Status#
Query the status endpoint to retrieve metrics:
curl -s http://localhost:8000/v1/status | python3 -m json.tool
For the full response schema and an example, refer to the GET /v1/status endpoint in the API Reference.
Prometheus#
NIM for BGR exposes Prometheus metrics
for request statistics at http://localhost:8000/v1/metrics.
To install Prometheus and scrape metrics:
Download the latest Prometheus release for your system:
wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz tar -xvzf prometheus-2.52.0.linux-amd64.tar.gz cd prometheus-2.52.0.linux-amd64/
Edit the
prometheus.ymlfile to scrape the NIM for BGR endpoint:scrape_configs: - job_name: "nim-bgr" static_configs: - targets: ["localhost:8000"]
Start the Prometheus server:
./prometheus --config.file=./prometheus.yml
Verify the target in Prometheus by navigating to http://localhost:9090/targets?search=.
Grafana#
Visualize metrics with Grafana.
Install the latest Grafana release for your system:
wget https://dl.grafana.com/oss/release/grafana-11.0.0.linux-amd64.tar.gz tar -zxvf grafana-11.0.0.linux-amd64.tar.gz cd grafana-v11.0.0/
Start the Grafana server:
./bin/grafana-server
Access the dashboard by navigating to http://localhost:3000 and log in with the default credentials:
Username:
adminPassword:
admin
Configure the Prometheus data source:
Click Data Source.
Select Prometheus.
Set the URL to
localhost:9090.Save the configuration.