Base Command Manager
Base Command Manager
Overview
NVIDIA Base Command Manager (BCM) is a comprehensive cluster management solution designed specifically for AI and HPC workloads. It provides centralized management of NVIDIA DGX systems, GPU clusters, and heterogeneous computing environments. BCM simplifies datacenter operations by offering automated provisioning, monitoring, and management capabilities for large-scale AI infrastructure.
Key Concepts
Cluster Management
BCM provides unified management of:
- Compute nodes - DGX systems, GPU servers, and traditional compute nodes
- Storage systems - Network-attached storage and distributed file systems
- Network infrastructure - High-speed interconnects and network switches
- User management - Authentication, authorization, and resource quotas
- Job scheduling - Integration with SLURM, Kubernetes, and other schedulers
Resource Orchestration
BCM orchestrates resources through:
- Automated provisioning - Bare metal and container-based deployment
- Configuration management - Centralized system configuration and updates
- Monitoring and alerting - Real-time health monitoring and proactive maintenance
- Backup and recovery - Automated backup strategies and disaster recovery
BCM Architecture
Core Components
BCM Head Node:
- Central management server running BCM software
- Web-based management interface
- REST API for programmatic access
- Database for configuration and monitoring data
BCM Compute Nodes:
- Managed compute resources (DGX systems, GPU servers)
- BCM agent software for communication with head node
- Automated configuration and monitoring capabilities
BCM Storage:
- Centralized configuration and user data storage
- Backup and recovery management
- Shared file systems and data management
DPS Integration
DPS (Domain Power Service) is integrated into BCM as a plugin to provide power management and optimization capabilities for the infrastructure BCM manages.
Entity Generation
BCM can automatically generate DPS entities from its cluster inventory:
# Generate DPS entities from BCM cluster
dpsctl bcm import \
--url https://bcm-headnode:8443 \
--username admin \
--password secret123Generated Entity Example:
{
"entities": [
{
"name": "dgx001",
"type": "ComputerSystem",
"model": "DGX_H100",
"redfish": {
"@odata.type": "#ComputerSystem.v1_23_0.ComputerSystem",
"@odata.id": "/dgx001",
"id": "dgx001",
"url": "https://dgx001-bmc.example.com",
"secret_name": "dgx001"
}
}
]
}Further Reading
- NVIDIA Base Command Manager - Official BCM product page
- Base Command Platform Documentation - Complete platform documentation
- BCM Integration with DPS - Entity generation from BCM
- DPS BCM Commands - DPS CLI integration with BCM
- SLURM Integration - Job scheduling with SLURM