Observability#

Platform Monitoring#

Platform-level monitoring—GPU (DCGM), SR-IOV network, Precision Time Protocol, and NMOS Registry dashboards—is provided by Holoscan for Media and documented in Platform Monitoring in the Holoscan for Media user guide.

The following sections cover monitoring specific to the NIM.

Health Probes#

Charts and the operator configure HTTP probes on the application health port (default 8000):

Probe	Endpoint	Purpose
Startup	`/v1/health/live`	Checks whether the container has started.
Liveness	`/v1/health/ready`	Checks whether the process is alive.
Readiness	`/v1/health/ready`	Checks whether the service is ready to serve traffic.

Check health from inside the cluster:

kubectl exec -it deploy/<nim-deployment> -- curl -sS localhost:8000/v1/health/ready

Port-forward for local debugging:

kubectl port-forward deploy/<nim-deployment> 8000:8000
curl -sS http://127.0.0.1:8000/v1/health/ready

HTTP Endpoints#

The NIM exposes the following HTTP endpoints on port 8000. Use port-forward to access them locally:

kubectl port-forward deploy/<nim-deployment> 8000:8000

Endpoint	Description
`GET /v1/health/live`	Liveness check—returns 200 when the container is running.
`GET /v1/health/ready`	Readiness check—returns 200 when the NIM is ready to process streams.
`GET /v1/metrics`	Prometheus metrics (NIM process and GPU). For details, refer to Prometheus Metrics.
`GET /v1/version`	Returns the NIM release and API version: `{"release":"1.0.0","api":"3.1.0"}`.
`GET /v1/metadata`	Returns the loaded model URL and selected profile checksum. Useful for confirming which model is active without inspecting the container filesystem.
`GET /v1/manifest`	Returns full profile resolution details, including GPU type and workspace hash for each available profile.
`GET /v1/models`	Lists the models currently loaded in the NIM.
`GET /v1/license`	Returns license information for the running NIM.
`GET /docs`	Swagger UI—interactive API documentation.
`GET /openapi.json`	Full OpenAPI schema for all endpoints.

Useful Queries#

Confirm which model and profile are loaded:

curl -s http://localhost:8000/v1/metadata | jq '.modelInfo[0].modelUrl, .selectedModelProfileId'

Check the NIM release and API version:

curl -s http://localhost:8000/v1/version

List all available endpoints:

curl -s http://localhost:8000/openapi.json | jq '.paths | keys'

Prometheus Metrics#

The NIM exposes Prometheus metrics at http://<pod-ip>:8000/v1/metrics. These metrics can be integrated into the Grafana monitoring stack provided by Holoscan for Media; refer to Platform Monitoring.

Retrieve metrics from inside the cluster:

kubectl exec -it deploy/<nim-deployment> -- curl -sS localhost:8000/v1/metrics

The following metrics are available:

Category	Metric Name	Description
GPU	`gpu_power_usage_watts`	GPU instantaneous power, in watts.
GPU	`gpu_power_limit_watts`	Maximum GPU power limit, in watts.
GPU	`gpu_total_energy_consumption_joules`	GPU total energy consumption, in joules.
GPU	`gpu_utilization`	GPU utilization rate (0.0–1.0).
GPU	`gpu_memory_total_bytes`	Total GPU memory, in bytes.
GPU	`gpu_memory_used_bytes`	Used GPU memory, in bytes.
Process	`process_virtual_memory_bytes`	Virtual memory size, in bytes.
Process	`process_resident_memory_bytes`	Resident memory size, in bytes.
Process	`process_start_time_seconds`	Start time of the process since Unix epoch, in seconds.
Process	`process_cpu_seconds_total`	Total user and system CPU time, in seconds.
Process	`process_open_fds`	Number of open file descriptors.
Process	`process_max_fds`	Maximum number of open file descriptors.
Python	`python_gc_objects_collected_total`	Objects collected during garbage collection.
Python	`python_gc_objects_uncollectable_total`	Uncollectable objects found during garbage collection.
Python	`python_gc_collections_total`	Number of times this generation was collected.
Python	`python_info`	Python platform information.

Triton Inference Server Metrics#

The NIM container also runs Triton Inference Server locally. Triton metrics are exposed on port 9002 (/metrics) inside the pod:

kubectl exec -it deploy/<nim-deployment> -- curl -sS localhost:9002/metrics | grep nv_inference

On Red Hat OpenShift, replace kubectl with oc.

Note

Expose port 9002 only if your security model allows it. Use a service, PodMonitor, or scrape configuration as appropriate for your cluster.

Triton Latency Metrics#

The service exposes Triton latency metrics in microseconds (us).

Metrics#

Metric	Description
`nv_inference_request_summary_us`	End-to-end request latency
`nv_inference_queue_summary_us`	Queue latency
`nv_inference_compute_infer_summary_us`	Model inference latency

Supported Quantiles#

Quantile	Description
`P50`	50% of requests completed within this latency.
`P95`	95% of requests completed within this latency.
`P99`	99% of requests completed within this latency.

Example Metric#

nv_inference_request_summary_us{
  model="LipsyncU16",
  version="1",
  quantile="0.95"
} 13685

PromQL Examples#

Average request latency (ms):

(
  nv_inference_request_summary_us_sum
  /
  nv_inference_request_summary_us_count
) / 1000

P95 request latency (ms):

nv_inference_request_summary_us{
  model="LipsyncU16",
  quantile="0.95"
} / 1000

P99 request latency (ms):

nv_inference_request_summary_us{
  model="LipsyncU16",
  quantile="0.99"
} / 1000

For all Triton metrics, refer to Metrics in the Triton Inference Server user guide.

Operator Controller Metrics#

The operator controller-manager exposes controller-runtime metrics on the manager metrics service (HTTPS on port 8443 in the shipped nvidia-lipsync-h4m-operator chart). Refer to the operator Helm chart values.yaml and templates/service-metrics.yaml if your release overrides bind address or TLS.

Logs#

Pod Logs#

Stream logs from the NIM pod using the deployment name:

kubectl logs deploy/<nim-deployment> --follow

To retrieve logs from a specific container within the pod, add --container <container-name>. List all containers in a pod with the following command:

kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].name}'

On Red Hat OpenShift, replace kubectl with oc.

Persistent Log Files#

When the serverLogs persistent volume claim is enabled, the chart mounts a PVC at /var/log/lipsync and the NIM writes time-stamped log files there. This allows log retention beyond the lifetime of the pod and is useful for post-mortem analysis.

Enable via Helm:

serverLogs:
  enabled: true
  size: "5Gi"

Or enable via the operator custom resource under spec.parameters.serverLogs.

To inspect log files directly from a running pod:

kubectl exec deploy/<nim-deployment> -- ls /var/log/lipsync

For the full set of serverLogs options, refer to Common Helm Configuration.

NMOS#

For NMOS deployments, verify device registrations using the NMOS Registry independently of the NIM health endpoints. NMOS Registry monitoring and dashboards are covered in Platform Monitoring in the Holoscan for Media user guide.