Observability#

Platform Monitoring#

Platform-level monitoring—GPU (DCGM), SR-IOV network, Precision Time Protocol, and NMOS Registry dashboards—is provided by Holoscan for Media and documented in Platform Monitoring in the Holoscan for Media user guide.

The following sections cover monitoring specific to the NIM.

Health Probes#

Charts and the operator configure HTTP probes on the application health port (default 8000):

Probe	Endpoint	Purpose
Startup	`/v1/health/live`	Checks whether the container has started.
Liveness	`/v1/health/ready`	Checks whether the process is alive.
Readiness	`/v1/health/ready`	Checks whether the service is ready to serve traffic.

Check health from inside the cluster:

kubectl exec -it deploy/<studio-voice-deployment> -- curl -sS localhost:8000/v1/health/ready

When the NIM is ready, the response body is as follows:

{"object":"health.response","message":"NIM is ready","status":"ready"}

Port-forward for local debugging:

kubectl port-forward deploy/<studio-voice-deployment> 8000:8000
curl -sS http://127.0.0.1:8000/v1/health/ready

On Red Hat OpenShift, replace kubectl with oc.

HTTP Endpoints#

The NIM exposes the following HTTP endpoints on port 8000. All accept GET.

Endpoint	Purpose
`/v1/health/live`	Liveness probe. Returns `200` when the process is running.
`/v1/health/ready`	Readiness probe. Returns `200` with `{"object":"health.response","message":"NIM is ready","status":"ready"}` when the model is loaded.
`/v1/metrics`	NIM-level Prometheus exposition (refer to Prometheus Metrics).
`/v1/version`	Container API and release versions; for example, `{"release":"1.0","api":"3.1.0"}`.
`/v1/metadata`	Loaded model URL and license SHA, including `modelInfo[].modelUrl` (for example, `ngc://nim/nvidia/maxine-studio-voice:2.1.0`) and `licenseInfo.sha`. Use to confirm the exact model version at runtime without inspecting `model_manifest.yaml`.
`/v1/manifest`	Full container manifest contents (license headers and model entries).
`/v1/models`	Loaded model identifiers in OpenAI-style format; for example, `{"data":[{"id":"maxine-studio-voice", ...}]}`.
`/v1/license`	License file content with SHA and size.
`/docs`	Swagger UI for the NIM HTTP API.
`/openapi.json`	OpenAPI schema for the NIM HTTP API.

Retrieve from inside the pod; for example:

kubectl exec -it deploy/<studio-voice-deployment> -- curl -sS localhost:8000/v1/metadata

On Red Hat OpenShift, replace kubectl with oc.

Prometheus Metrics#

The NIM exposes Prometheus metrics at http://<pod-ip>:8000/v1/metrics. These metrics can be integrated into the Grafana monitoring stack provided by Holoscan for Media; refer to Platform Monitoring.

Retrieve metrics from inside the cluster:

kubectl exec -it deploy/<studio-voice-deployment> -- curl -sS localhost:8000/v1/metrics

The following metrics are available:

Category	Metric Name	Description
GPU	`gpu_power_usage_watts`	GPU instantaneous power, in watts.
GPU	`gpu_power_limit_watts`	Maximum GPU power limit, in watts.
GPU	`gpu_total_energy_consumption_joules`	GPU total energy consumption, in joules.
GPU	`gpu_utilization`	GPU utilization rate (0.0–1.0).
GPU	`gpu_memory_total_bytes`	Total GPU memory, in bytes.
GPU	`gpu_memory_used_bytes`	Used GPU memory, in bytes.
Process	`process_virtual_memory_bytes`	Virtual memory size, in bytes.
Process	`process_resident_memory_bytes`	Resident memory size, in bytes.
Process	`process_start_time_seconds`	Start time of the process since Unix epoch, in seconds.
Process	`process_cpu_seconds_total`	Total user and system CPU time, in seconds.
Process	`process_open_fds`	Number of open file descriptors.
Process	`process_max_fds`	Maximum number of open file descriptors.
Python	`python_gc_objects_collected_total`	Objects collected during garbage collection.
Python	`python_gc_objects_uncollectable_total`	Uncollectable objects found during garbage collection.
Python	`python_gc_collections_total`	Number of times this generation was collected.
Python	`python_info`	Python platform information.

Triton Inference Server Metrics#

The NIM container also runs Triton Inference Server locally. Triton metrics are exposed on port 9002 (/metrics) inside the pod:

kubectl exec -it deploy/<studio-voice-deployment> -- curl -sS localhost:9002/metrics | head

Note

Expose port 9002 only if your security model allows it. Use a service, PodMonitor, or scrape configuration as appropriate for your cluster.

Studio Voice invokes Triton through its in-process C++ Backend Interface API, not through HTTP/gRPC. As a result, the Triton request-level counters (nv_inference_request_success, nv_inference_request_failure, nv_inference_count, nv_inference_exec_count) remain at 0 even while the NIM is actively processing audio. Use the GPU-level metrics instead to determine whether inference is happening.

Metric Source	Useful Metric	Increments Under Load?
Triton (port `9002`)	`nv_gpu_utilization`	Yes
Triton (port `9002`)	`nv_gpu_power_usage`	Yes
Triton (port `9002`)	`nv_gpu_memory_used_bytes`	Yes
Triton (port `9002`)	`nv_inference_request_success` / `_count` / `_exec_count`	No (in-process backend API)
NIM (port `8000`, `/v1/metrics`)	`gpu_utilization`	Yes
NIM (port `8000`, `/v1/metrics`)	`gpu_power_usage_watts`	Yes

Note

A liveness alert for Studio Voice should be based on nv_gpu_utilization (or gpu_utilization on port 8000) crossing an idle-to-load threshold, not on rate(nv_inference_request_success[...]). The request-level rule produces false positives on every healthy Studio Voice deployment.

For the full Triton metrics surface (memory, model load, cache, and so on, which are still exposed and useful), refer to Metrics in the Triton Inference Server guide.

Operator Controller Metrics#

The operator controller-manager exposes controller-runtime metrics on the manager metrics service (TLS, port per operator chart defaults). Refer to the operator Helm chart values.yaml for the metrics port and service configuration.

Logs#

Pod Logs#

Stream logs from the NIM pod using the deployment name:

kubectl logs deploy/<studio-voice-deployment> --follow

To retrieve logs from a specific container within the pod, add --container <container-name>. List all containers in a pod with the following command:

kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].name}'

On Red Hat OpenShift, replace kubectl with oc.

Persistent Log Files#

When the nimLogs persistent volume claim is enabled, the NIM writes time-stamped log files to the configured mount path (default /workspace/nim-logs). This allows log retention beyond the lifetime of the pod and is useful for post-mortem analysis.

Enable via Helm:

nimLogs:
  enabled: true
  size: "5Gi"
  mountPath: "/workspace/nim-logs"

Or via the operator custom resource under spec.parameters.nimLogs. Use path for the mount path and pvc.{enabled,size,storageClassName} for the backing claim.

To inspect log files directly from a running pod:

kubectl exec deploy/<studio-voice-deployment> -- ls /workspace/nim-logs

For the full set of nimLogs options, refer to Common Helm Configuration.

NMOS#

For NMOS deployments, verify device registrations using the NMOS Registry independently of the NIM health endpoints. NMOS Registry monitoring and dashboards are covered in Platform Monitoring in the Holoscan for Media user guide.