The State Manager is a Go library (commons/pkg/statemanager) that manages the dgxc.nvidia.com/nvsentinel-state node label lifecycle across NVSentinel modules. It provides a state machine implementation with transition validation and observability for coordinating Fault Quarantine, Node Drainer, and Fault Remediation operations.
Coordinates node lifecycle state across three modules operating on the same node:
quarantined statedraining → drain-succeeded or drain-failedremediating → remediation-succeeded or remediation-failedProvides:
*Terminal until healthy event triggers label removal
Successful remediation:
No pods to drain:
Failed drain:
Canceled drain (healthy event):
Failed remediation: