Observability#

Platform Monitoring#

Platform-level monitoring—GPU (DCGM), SR-IOV network, Precision Time Protocol, and NMOS Registry dashboards—is provided by Holoscan for Media and documented in Platform Monitoring in the Holoscan for Media user guide.

The following sections cover monitoring specific to the NIM.


Health Probes#

Charts and the operator configure HTTP probes on the application health port (default 8000):

Probe

Endpoint

Purpose

Startup

/v1/health/live

Checks whether the container has started.

Liveness

/v1/health/ready

Checks whether the process is alive.

Readiness

/v1/health/ready

Checks whether the service is ready to serve traffic.

Check health from inside the cluster:

kubectl exec -it deploy/<studio-voice-deployment> -- curl -sS localhost:8000/v1/health/ready

When the NIM is ready, the response body is as follows:

{"object":"health.response","message":"NIM is ready","status":"ready"}

Port-forward for local debugging:

kubectl port-forward deploy/<studio-voice-deployment> 8000:8000
curl -sS http://127.0.0.1:8000/v1/health/ready

On Red Hat OpenShift, replace kubectl with oc.

HTTP Endpoints#

The NIM exposes the following HTTP endpoints on port 8000. All accept GET.

Endpoint

Purpose

/v1/health/live

Liveness probe. Returns 200 when the process is running.

/v1/health/ready

Readiness probe. Returns 200 with {"object":"health.response","message":"NIM is ready","status":"ready"} when the model is loaded.

/v1/metrics

NIM-level Prometheus exposition (refer to Prometheus Metrics).

/v1/version

Container API and release versions; for example, {"release":"1.0","api":"3.1.0"}.

/v1/metadata

Loaded model URL and license SHA, including modelInfo[].modelUrl (for example, ngc://nim/nvidia/maxine-studio-voice:2.1.0) and licenseInfo.sha. Use to confirm the exact model version at runtime without inspecting model_manifest.yaml.

/v1/manifest

Full container manifest contents (license headers and model entries).

/v1/models

Loaded model identifiers in OpenAI-style format; for example, {"data":[{"id":"maxine-studio-voice", ...}]}.

/v1/license

License file content with SHA and size.

/docs

Swagger UI for the NIM HTTP API.

/openapi.json

OpenAPI schema for the NIM HTTP API.

Retrieve from inside the pod; for example:

kubectl exec -it deploy/<studio-voice-deployment> -- curl -sS localhost:8000/v1/metadata

On Red Hat OpenShift, replace kubectl with oc.

Prometheus Metrics#

The NIM exposes Prometheus metrics at http://<pod-ip>:8000/v1/metrics. These metrics can be integrated into the Grafana monitoring stack provided by Holoscan for Media; refer to Platform Monitoring.

Retrieve metrics from inside the cluster:

kubectl exec -it deploy/<studio-voice-deployment> -- curl -sS localhost:8000/v1/metrics

The following metrics are available:

Category

Metric Name

Description

GPU

gpu_power_usage_watts

GPU instantaneous power, in watts.

GPU

gpu_power_limit_watts

Maximum GPU power limit, in watts.

GPU

gpu_total_energy_consumption_joules

GPU total energy consumption, in joules.

GPU

gpu_utilization

GPU utilization rate (0.0–1.0).

GPU

gpu_memory_total_bytes

Total GPU memory, in bytes.

GPU

gpu_memory_used_bytes

Used GPU memory, in bytes.

Process

process_virtual_memory_bytes

Virtual memory size, in bytes.

Process

process_resident_memory_bytes

Resident memory size, in bytes.

Process

process_start_time_seconds

Start time of the process since Unix epoch, in seconds.

Process

process_cpu_seconds_total

Total user and system CPU time, in seconds.

Process

process_open_fds

Number of open file descriptors.

Process

process_max_fds

Maximum number of open file descriptors.

Python

python_gc_objects_collected_total

Objects collected during garbage collection.

Python

python_gc_objects_uncollectable_total

Uncollectable objects found during garbage collection.

Python

python_gc_collections_total

Number of times this generation was collected.

Python

python_info

Python platform information.

Triton Inference Server Metrics#

The NIM container also runs Triton Inference Server locally. Triton metrics are exposed on port 9002 (/metrics) inside the pod:

kubectl exec -it deploy/<studio-voice-deployment> -- curl -sS localhost:9002/metrics | head

Note

Expose port 9002 only if your security model allows it. Use a service, PodMonitor, or scrape configuration as appropriate for your cluster.

Studio Voice invokes Triton through its in-process C++ Backend Interface API, not through HTTP/gRPC. As a result, the Triton request-level counters (nv_inference_request_success, nv_inference_request_failure, nv_inference_count, nv_inference_exec_count) remain at 0 even while the NIM is actively processing audio. Use the GPU-level metrics instead to determine whether inference is happening.

Metric Source

Useful Metric

Increments Under Load?

Triton (port 9002)

nv_gpu_utilization

Yes

Triton (port 9002)

nv_gpu_power_usage

Yes

Triton (port 9002)

nv_gpu_memory_used_bytes

Yes

Triton (port 9002)

nv_inference_request_success / _count / _exec_count

No (in-process backend API)

NIM (port 8000, /v1/metrics)

gpu_utilization

Yes

NIM (port 8000, /v1/metrics)

gpu_power_usage_watts

Yes

Note

A liveness alert for Studio Voice should be based on nv_gpu_utilization (or gpu_utilization on port 8000) crossing an idle-to-load threshold, not on rate(nv_inference_request_success[...]). The request-level rule produces false positives on every healthy Studio Voice deployment.

For the full Triton metrics surface (memory, model load, cache, and so on, which are still exposed and useful), refer to Metrics in the Triton Inference Server guide.

Operator Controller Metrics#

The operator controller-manager exposes controller-runtime metrics on the manager metrics service (TLS, port per operator chart defaults). Refer to the operator Helm chart values.yaml for the metrics port and service configuration.

Logs#

Pod Logs#

Stream logs from the NIM pod using the deployment name:

kubectl logs deploy/<studio-voice-deployment> --follow

To retrieve logs from a specific container within the pod, add --container <container-name>. List all containers in a pod with the following command:

kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].name}'

On Red Hat OpenShift, replace kubectl with oc.

Persistent Log Files#

When the nimLogs persistent volume claim is enabled, the NIM writes time-stamped log files to the configured mount path (default /workspace/nim-logs). This allows log retention beyond the lifetime of the pod and is useful for post-mortem analysis.

Enable via Helm:

nimLogs:
  enabled: true
  size: "5Gi"
  mountPath: "/workspace/nim-logs"

Or via the operator custom resource under spec.parameters.nimLogs. Use path for the mount path and pvc.{enabled,size,storageClassName} for the backing claim.

To inspect log files directly from a running pod:

kubectl exec deploy/<studio-voice-deployment> -- ls /workspace/nim-logs

For the full set of nimLogs options, refer to Common Helm Configuration.

NMOS#

For NMOS deployments, verify device registrations using the NMOS Registry independently of the NIM health endpoints. NMOS Registry monitoring and dashboards are covered in Platform Monitoring in the Holoscan for Media user guide.