Terminology#
This section defines key terms and concepts used throughout the NVDebug documentation.
Core Concepts#
- BMC (Baseboard Management Controller)
A specialized service processor that provides out-of-band management capabilities for server hardware.
- HMC (Hardware Management Controller)
A specialized controller used in HGX systems for hardware management and monitoring.
- Log Collector
A defined type of log collection process. Refer to the command reference for detailed collector information.
- Log Collector Group
A grouping mechanism for log collections, such as Redfish, IPMI, SSH, Host, and HealthCheck. Each Log Collector Group contains at least one Log Collector ID.
- Log Collector ID
A unique identifier assigned to each log collection instance (for example, R1, I1, S1, H1). Also referred to as a CID (Collector ID).
- NodeType
The type of node in a system, which can be one of the following:
Compute: Standard compute nodes
SwitchTray: NVSwitch tray nodes
PowerShelf: Power management nodes
Platform-Specific Terms#
- ARM64
ARM 64-bit architecture used in NVIDIA Grace-based platforms.
- DGX (Data Center GPU)
NVIDIA’s AI supercomputer platform.
- HGX-HMC (Hyperscale GPU-HMC)
NVIDIA’s hyperscale GPU platform for data centers.
- MGX (Modular GPU)
NVIDIA’s modular GPU platform.
- X86_64
x86_64 architecture used in NVIDIA GPU platforms.
- PowerShelf
Power management system for large-scale deployments.
Collection Terms#
- Collection Level
The scope of log collection:
Default: All necessary collectors
-V: Default + Increased Log Collection Level
-VV: Default + Increased Log Collection Level + Additional Collectors
- Execution Summary
A report showing the status of all collectors (Complete, Partial, Error, Skipped).
- Log Sanitization
Process of removing sensitive information (IP addresses, usernames, passwords) from logs.
- Port Forwarding
Network technique used to access BMC services through SSH tunnels.
- Preflight Check
A validation step that checks configuration and connectivity before full collection.