Observability#
BMD-Specific Metrics#
NIM for BMD exposes operational metrics using the GET /v1/status endpoint. These metrics provide visibility into the molecular dynamics simulation workload.
Aggregate Metrics#
Aggregate metrics track the combined molecular dynamics tasks, API requests, and overall queue depths across all active GPU workers:
Metric |
Type |
Description |
|---|---|---|
|
counter |
Total molecular dynamics simulation tasks received |
|
counter |
Total tasks completed |
|
gauge |
Current number of atoms queued for processing |
|
gauge |
Current number of tasks waiting in the queue |
|
counter |
Total API requests received |
|
counter |
Total API requests completed |
|
gauge |
API requests currently being processed |
|
gauge |
Maximum atoms per batch for the current GPU configuration |
Per-Worker Metrics#
Each GPU worker reports its own metrics from the following:
Metric |
Type |
Description |
|---|---|---|
|
string |
GPU device identifier (for example, |
|
counter |
Tasks received by this worker |
|
counter |
Tasks completed by this worker |
|
gauge |
Atoms queued on this worker |
|
gauge |
Tasks queued on this worker |
|
gauge |
Current batch size |
|
gauge |
Maximum batch size for this GPU |
|
string |
GPU model name |
Query Status#
Query the status endpoint to retrieve metrics:
curl -s http://localhost:8000/v1/status | python3 -m json.tool
For the full response schema and an example, refer to the GET /v1/status endpoint in the API Reference.
Prometheus#
NIM for BMD exposes Prometheus metrics for request statistics at http://localhost:8000/v1/metrics.
To install Prometheus and scrape metrics:
Download the latest Prometheus release for your system:
$ wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz $ tar -xvzf prometheus-2.52.0.linux-amd64.tar.gz $ cd prometheus-2.52.0.linux-amd64/
Edit the
prometheus.ymlfile to scrape the NIM for BMD endpoint:scrape_configs: - job_name: "nim-bmd" static_configs: - targets: ["localhost:8000"]
Start the Prometheus server:
$ ./prometheus --config.file=./prometheus.yml
Verify the target in Prometheus by navigating to
http://localhost:9090/targets?search=.
Grafana#
Visualize metrics with Grafana.
Install the latest Grafana release for your system:
$ wget https://dl.grafana.com/oss/release/grafana-11.0.0.linux-amd64.tar.gz $ tar -zxvf grafana-11.0.0.linux-amd64.tar.gz $ cd grafana-v11.0.0/
Start the Grafana server:
$ ./bin/grafana-serverAccess the dashboard by navigating to
http://localhost:3000and log in with the default credentials:Username:
adminPassword:
admin
Configure the Prometheus data source:
Click Data Source.
Select Prometheus.
Set the URL to
localhost:9090.Save the configuration.