Observability#

Platform Monitoring#

Platform-level monitoring—GPU (DCGM), SR-IOV network, Precision Time Protocol, and NMOS Registry dashboards—is provided by Holoscan for Media and documented in Platform Monitoring in the Holoscan for Media user guide.

The following sections cover monitoring specific to the NIM.


Health Probes#

Charts and the operator configure HTTP probes on the application health port (default 8000):

Probe

Endpoint

Purpose

Startup

/v1/health/live

Checks whether the container has started.

Liveness

/v1/health/ready

Checks whether the process is alive.

Readiness

/v1/health/ready

Checks whether the service is ready to serve traffic.

Check health from inside the cluster:

kubectl exec -it deploy/<nim-deployment> -- curl -sS localhost:8000/v1/health/ready

Port-forward for local debugging:

kubectl port-forward deploy/<nim-deployment> 8000:8000
curl -sS http://127.0.0.1:8000/v1/health/ready

HTTP Endpoints#

The NIM exposes the following HTTP endpoints on port 8000. Use port-forward to access them locally:

kubectl port-forward deploy/<nim-deployment> 8000:8000

Endpoint

Description

GET /v1/health/live

Liveness check—returns 200 when the container is running.

GET /v1/health/ready

Readiness check—returns 200 when the NIM is ready to process streams.

GET /v1/metrics

Prometheus metrics (NIM process and GPU). For details, refer to Prometheus Metrics.

GET /v1/version

Returns the NIM release and API version: {"release":"1.0.0","api":"3.1.0"}.

GET /v1/metadata

Returns the loaded model URL and selected profile checksum. Useful for confirming which model is active without inspecting the container filesystem.

GET /v1/manifest

Returns full profile resolution details, including GPU type and workspace hash for each available profile.

GET /v1/models

Lists the models currently loaded in the NIM.

GET /v1/license

Returns license information for the running NIM.

GET /docs

Swagger UI—interactive API documentation.

GET /openapi.json

Full OpenAPI schema for all endpoints.

Useful Queries#

Confirm which model and profile are loaded:

curl -s http://localhost:8000/v1/metadata | jq '.modelInfo[0].modelUrl, .selectedModelProfileId'

Check the NIM release and API version:

curl -s http://localhost:8000/v1/version

List all available endpoints:

curl -s http://localhost:8000/openapi.json | jq '.paths | keys'

Prometheus Metrics#

The NIM exposes Prometheus metrics at http://<pod-ip>:8000/v1/metrics. These metrics can be integrated into the Grafana monitoring stack provided by Holoscan for Media; refer to Platform Monitoring.

Retrieve metrics from inside the cluster:

kubectl exec -it deploy/<nim-deployment> -- curl -sS localhost:8000/v1/metrics

The following metrics are available:

Category

Metric Name

Description

GPU

gpu_power_usage_watts

GPU instantaneous power, in watts.

GPU

gpu_power_limit_watts

Maximum GPU power limit, in watts.

GPU

gpu_total_energy_consumption_joules

GPU total energy consumption, in joules.

GPU

gpu_utilization

GPU utilization rate (0.0–1.0).

GPU

gpu_memory_total_bytes

Total GPU memory, in bytes.

GPU

gpu_memory_used_bytes

Used GPU memory, in bytes.

Process

process_virtual_memory_bytes

Virtual memory size, in bytes.

Process

process_resident_memory_bytes

Resident memory size, in bytes.

Process

process_start_time_seconds

Start time of the process since Unix epoch, in seconds.

Process

process_cpu_seconds_total

Total user and system CPU time, in seconds.

Process

process_open_fds

Number of open file descriptors.

Process

process_max_fds

Maximum number of open file descriptors.

Python

python_gc_objects_collected_total

Objects collected during garbage collection.

Python

python_gc_objects_uncollectable_total

Uncollectable objects found during garbage collection.

Python

python_gc_collections_total

Number of times this generation was collected.

Python

python_info

Python platform information.

Triton Inference Server Metrics#

The NIM container also runs Triton Inference Server locally. Triton metrics are exposed on port 9002 (/metrics) inside the pod:

kubectl exec -it deploy/<nim-deployment> -- curl -sS localhost:9002/metrics | grep nv_inference

On Red Hat OpenShift, replace kubectl with oc.

Note

Expose port 9002 only if your security model allows it. Use a service, PodMonitor, or scrape configuration as appropriate for your cluster.

Triton Latency Metrics#

The service exposes Triton latency metrics in microseconds (us).

Metrics#

Metric

Description

nv_inference_request_summary_us

End-to-end request latency

nv_inference_queue_summary_us

Queue latency

nv_inference_compute_infer_summary_us

Model inference latency

Supported Quantiles#

Quantile

Description

P50

50% of requests completed within this latency.

P95

95% of requests completed within this latency.

P99

99% of requests completed within this latency.

Example Metric#

nv_inference_request_summary_us{
  model="LipsyncU16",
  version="1",
  quantile="0.95"
} 13685

PromQL Examples#

Average request latency (ms):

(
  nv_inference_request_summary_us_sum
  /
  nv_inference_request_summary_us_count
) / 1000

P95 request latency (ms):

nv_inference_request_summary_us{
  model="LipsyncU16",
  quantile="0.95"
} / 1000

P99 request latency (ms):

nv_inference_request_summary_us{
  model="LipsyncU16",
  quantile="0.99"
} / 1000

For all Triton metrics, refer to Metrics in the Triton Inference Server user guide.

Operator Controller Metrics#

The operator controller-manager exposes controller-runtime metrics on the manager metrics service (HTTPS on port 8443 in the shipped nvidia-lipsync-h4m-operator chart). Refer to the operator Helm chart values.yaml and templates/service-metrics.yaml if your release overrides bind address or TLS.

Logs#

Pod Logs#

Stream logs from the NIM pod using the deployment name:

kubectl logs deploy/<nim-deployment> --follow

To retrieve logs from a specific container within the pod, add --container <container-name>. List all containers in a pod with the following command:

kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].name}'

On Red Hat OpenShift, replace kubectl with oc.

Persistent Log Files#

When the serverLogs persistent volume claim is enabled, the chart mounts a PVC at /var/log/lipsync and the NIM writes time-stamped log files there. This allows log retention beyond the lifetime of the pod and is useful for post-mortem analysis.

Enable via Helm:

serverLogs:
  enabled: true
  size: "5Gi"

Or enable via the operator custom resource under spec.parameters.serverLogs.

To inspect log files directly from a running pod:

kubectl exec deploy/<nim-deployment> -- ls /var/log/lipsync

For the full set of serverLogs options, refer to Common Helm Configuration.

NMOS#

For NMOS deployments, verify device registrations using the NMOS Registry independently of the NIM health endpoints. NMOS Registry monitoring and dashboards are covered in Platform Monitoring in the Holoscan for Media user guide.