Observability#
Platform Monitoring#
Platform-level monitoring—GPU (DCGM), SR-IOV network, Precision Time Protocol, and NMOS Registry dashboards—is provided by Holoscan for Media and documented in Platform Monitoring in the Holoscan for Media user guide.
The following sections cover monitoring specific to the NIM.
Health Probes#
Charts and the operator configure HTTP probes on the application health port (default 8000):
Probe |
Endpoint |
Purpose |
|---|---|---|
Startup |
|
Checks whether the container has started. |
Liveness |
|
Checks whether the process is alive. |
Readiness |
|
Checks whether the service is ready to serve traffic. |
Check health from inside the cluster:
kubectl exec -it deploy/<nim-deployment> -- curl -sS localhost:8000/v1/health/ready
When the NIM is ready, the response body is as follows:
{"object":"health.response","message":"ASD NIM is ready","status":"ready"}
Port-forward for local debugging:
kubectl port-forward deploy/<nim-deployment> 8000:8000
curl -sS http://127.0.0.1:8000/v1/health/ready
HTTP Endpoints#
The NIM exposes the following HTTP endpoints on port 8000. Use port-forward to access them locally:
kubectl port-forward deploy/<nim-deployment> 8000:8000
Endpoint |
Description |
|---|---|
|
Liveness check—returns |
|
Readiness check—returns |
|
Prometheus metrics (NIM process and GPU). For details, refer to Prometheus Metrics. |
|
Returns the NIM release and API version: |
|
Returns the loaded model URL and selected profile checksum. Useful for confirming which model is active without inspecting the container filesystem. |
|
Returns full profile resolution details, including GPU type and workspace hash for each available profile. |
|
Lists the models currently loaded in the NIM. |
|
Returns license information for the running NIM. |
|
Swagger UI—interactive API documentation. |
|
Full OpenAPI schema for all endpoints. |
Useful Queries#
Confirm which model and profile are loaded:
curl -s http://localhost:8000/v1/metadata | jq '.modelInfo[0].modelUrl, .selectedModelProfileId'
Check the NIM release and API version:
curl -s http://localhost:8000/v1/version
List all available endpoints:
curl -s http://localhost:8000/openapi.json | jq '.paths | keys'
Prometheus Metrics#
The NIM exposes Prometheus metrics at http://<pod-ip>:8000/v1/metrics. These metrics can be integrated into the Grafana monitoring stack provided by Holoscan for Media; refer to Platform Monitoring.
Retrieve metrics from inside the cluster:
kubectl exec -it deploy/<nim-deployment> -- curl -sS localhost:8000/v1/metrics
The following metrics are available:
Category |
Metric Name |
Description |
|---|---|---|
GPU |
|
GPU instantaneous power, in watts. |
GPU |
|
Maximum GPU power limit, in watts. |
GPU |
|
GPU total energy consumption, in joules. |
GPU |
|
GPU utilization rate (0.0–1.0). |
GPU |
|
Total GPU memory, in bytes. |
GPU |
|
Used GPU memory, in bytes. |
Process |
|
Virtual memory size, in bytes. |
Process |
|
Resident memory size, in bytes. |
Process |
|
Start time of the process since Unix epoch, in seconds. |
Process |
|
Total user and system CPU time, in seconds. |
Process |
|
Number of open file descriptors. |
Process |
|
Maximum number of open file descriptors. |
Python |
|
Objects collected during garbage collection. |
Python |
|
Uncollectable objects found during garbage collection. |
Python |
|
Number of times this generation was collected. |
Python |
|
Python platform information. |
Triton Inference Server Metrics#
The NIM container also runs Triton Inference Server locally. Triton metrics are exposed on port 9002 (/metrics) inside the pod:
kubectl exec -it deploy/<nim-deployment> -- curl -sS localhost:9002/metrics | head
Note
Expose port 9002 only if your security model allows it. Use a service, PodMonitor, or scrape configuration as appropriate for your cluster.
Key Triton metrics include the following:
Request metrics: Counts of successful and failed inference requests.
Inference metrics: Request queue times, compute times, and overall request durations.
Model metrics: Model loading times, execution counts, and batch statistics.
Memory metrics: GPU and CPU memory usage for inference operations.
Cache metrics: Response cache hit and miss rates (when caching is enabled).
For comprehensive documentation on all available Triton metrics, refer to Metrics in the Triton Inference Server guide.
Operator Controller Metrics#
The operator controller-manager exposes controller-runtime metrics on the manager metrics service (TLS, port per operator chart defaults). Refer to the operator Helm chart values.yaml for the metrics port and service configuration.
Logs#
Pod Logs#
Stream logs from the NIM pod using the deployment name:
kubectl logs deploy/<nim-deployment> --follow
To retrieve logs from a specific container within the pod, add --container <container-name>. List all containers in a pod with the following command:
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].name}'
On Red Hat OpenShift, replace kubectl with oc.
Persistent Log Files#
When the nimLogs persistent volume claim is enabled, the NIM writes time-stamped log files to the configured mount path (default /workspace/nim-logs). This allows log retention beyond the lifetime of the pod and is useful for post-mortem analysis.
Enable via Helm:
nimLogs:
enabled: true
size: "5Gi"
mountPath: "/workspace/nim-logs"
Or via the operator custom resource under spec.parameters.nimLogs.
To inspect log files directly from a running pod:
kubectl exec deploy/<nim-deployment> -- ls /workspace/nim-logs
For the full set of nimLogs options, see Configuration Reference.
NMOS#
For NMOS deployments, verify device registrations using the NMOS Registry independently of the NIM health endpoints. NMOS Registry monitoring and dashboards are covered in Platform Monitoring in the Holoscan for Media user guide.