Metadata Collector | NVIDIA NVSentinel Documentation

Overview

The Metadata Collector module collects GPU metadata using NVIDIA NVML (Management Library) and writes it to a shared file. Other modules read this file to enrich health events with GPU serial numbers, UUIDs, and topology information. This component will also expose the pod-to-GPU mapping as an annotation on each pod requesting GPUs. This document covers all Helm configuration options for system administrators.

Configuration Reference

Module Enable/Disable

Controls whether the metadata-collector module is deployed in the cluster.

1 global:
2   metadataCollector:
3     enabled: true

Resources

Defines CPU and memory resource requests and limits for the metadata-collector init container.

1 metadata-collector:
2   resources:
3     limits:
4       cpu: 500m
5       memory: 256Mi
6     requests:
7       cpu: 100m
8       memory: 128Mi

Runtime Class

Specifies the container runtime class for GPU device access.

1 metadata-collector:
2   runtimeClassName: "nvidia"

Parameters

runtimeClassName

Runtime class name that provides GPU device access. Required for NVML to query GPU information.

Common values:

nvidia - NVIDIA container runtime (default)
nvidia-legacy - Legacy NVIDIA runtime
Empty string - Uses default cluster runtime. Used for CRIO environments