Release notes for NVIDIA Base Command™ Manager (BCM) 11.33.0#
Released: 29 May 2026
General#
New Features#
Added support for BaseOS 7.5.0
Added support for running Harbor and Kubernetes on the same node
Added support for deploying Harbor v2.14 in cm-container-registry-setup
Updated cm-tftboot GRUB images to 2.14 to support booting over a VLAN
Updated Network Operator to 25.10
Updated Pyxis and Enroot to 0.23.0 and 4.1.2
Updated Slurm 25.05 to 25.05.8
Updated Slurm 25.11 to 25.11.6
Updated Ubuntu 22.04 to 22.04.5
Updated Ubuntu 24.04 to 24.04.4
Updated cm-nvfwupd to 2.1.1
Updated cm-nvidia-container-toolkit to 1.19
Updated cm-nvhpc to 26.3
Updated containerd to 2.2.1
Updated CUDA 13.1 to 13.1.1
Updated golang-go-latest to 1.25.10
Updated lib-prometheus to 1.26.1
Reduced the rescue rootfs size to 320 MB in cm-clone-install to allow booting on certain Dell servers
Reduced the head node installer rootfs size to less than 300 MB to allow it to load on certain Dell servers
Excluded /var/lib/munge/munged.seed from software image synchronization
Fixed Issues#
Fixed copying of missing subscription files when creating RHEL 9 images with cm-create-image
Known Issues#
When updating a BCM Ubuntu 22.04 cluster with Slurm from BCM 11.32.0 or earlier to BCM 11.33.0, Slurm services may fail to start because the updated Slurm packages do not automatically install the required dependency package cm-hwloc2-local. As a workaround, install cm-hwloc2-local on the head node and in the software images (or compute nodes) used by Slurm.
CMDaemon#
New Features#
Added the ability to start and stop the nvdebug process per device from cmsh via RPC
dhcpd can now be prevented from running on head nodes by adding the
dhcpd_override_needs_to_be_runningextra-value with valuefalse
Fixed Issues#
Fixed a possible Topograph failure on Topograph certificate regeneration
Fixed automatic nginx.conf propagation when toggling the ingressproxyenable flag in KubeCluster configuration
Fixed a Kubernetes metric collection script vulnerability that permitted arbitrary code execution as root by regular users with kubectl create permissions
Fixed removal of module files on Kubernetes uninstallation
Fixed an issue where slurm.takeover.sh returned an error when debug was not enabled
Fixed empty Slurm topology generation on the first Slurm start after cm-wlm-setup
Fixed an issue where workload manager prolog and epilog scripts were not running in ABS order
Fixed the possibility of providing a malicious comment string to a Slurm job that is parsed by the gpu-workload-power-profiles prolog when profile validation is disabled
Fixed an issue where prolog and epilog scripts were not run when a non-Slurm workload manager is used
Fixed slurmctld starting after Slurm redeployment when the accounting node is in a different location
COD#
New Features#
AWS: Streamlined dataflow from the COD tool to the head node; the latest COD tool version is required
COD-AWS: Print VPC and subnet IDs during cluster creation; logging has also been improved
COD-GCP: Show image source blob dates instead of the GCP image resource date
COD-GCP: Improved BCM node identification to mitigate orphaning instances in the cloud when instance creation times out but eventually completes; as soon as an instance starts and enters netboot, it is identified by BCM, which also speeds up bulk instance creation
COD-AWS: Added support for RoCEv2 secondary networks (
--create-secondary-network/--create-secondary-subnetand--use-existing-secondary-network-id/--use-existing-secondary-subnet-id); BCM creates or reuses secondary networks and subnets and configures them accordinglyCOD-Azure: When deploying in a pre-existing resource group, added validations to ensure the resource group is specified for dependent pre-existing resources, moved resource group validation earlier so cluster create fails before the summary screen, and deprecated the
--existing-rgflag;--resource-group <name>now indicates deployment in an existing resource group, consistent with other CSP logicAdded image size listing for all COD flavors by specifying the
sizefield in the--columnslist
Fixed Issues#
COD-GCP: After
cm-cod-gcp cluster stop,cm-cod-gcp cluster listnow shows instance status as “stopped” instead of “terminated”Fixed
--store-head-node-ipso it only stores the IP address to file (it previously also stored the cluster name)Fixed
--versionalways showing trunkCOD-GCP: Retry HTTP 503 (Service Unavailable) errors
Fixed a runtime error when creating two or more clusters simultaneously when head node images are created from blobs
COD-AWS: Fixed –on-error-undo not working after SSH/CMDaemon wait timeout
COD-Azure: Fixed –on-error-undo not working after SSH/CMDaemon wait timeout
COD-AWS: Fixed CMDaemon initialization failure due to inability to fetch availability zones in the me-south-1 region
Fixed an issue that caused the serial console during node install on cloud nodes to be unreadable; node install on cloud nodes should now reliably output plaintext to the serial console
COD-GCP HA: Fixed compute nodes failing to boot if the original head node is unavailable
COD/CX-Azure: Fixed excessive memory usage in a Python script that caused OOM errors on smaller head nodes
Fixed an OCI disk size validation bug where the check incorrectly always used the first size configured after CMDaemon startup
cm-kubernetes-setup#
New Features#
Run:ai must now be deployed via the new cm-runai-setup wizard; the Run:ai installation option in cm-kubernetes-setup is now disabled
Added Kubernetes Gateway API support in cm-kubernetes-setup, replacing the legacy Ingress API and Ingress NGINX Controller with Kgateway and MetalLB (L2 or BGP mode, now enabled by default for Gateway IP allocation)
Replaced Ingress proxy with Gateway proxy on the head node: external traffic on the selected head node port (443 by default) is forwarded to the Gateways of the target Kubernetes cluster via TLS SNI; NodePorts 30080 and 30443 are no longer used
Added a Gateway configuration menu in cm-kubernetes-setup to configure Gateway TLS certificates, reconfigure the Gateway proxy, install Gateway components and replacements for legacy BCM ingresses, and remove the legacy Ingress NGINX controller and BCM Ingress resources
Existing clusters deployed with Ingress NGINX Controller can migrate to Gateway API without recreating the Kubernetes cluster; see the documented migration path in the BCM manual
Deprecated or Removed Features#
Ingress NGINX Controller-based Ingress resources are replaced by Gateway API resources
Fixed Issues#
Fixed the GPU operator option “latest” in cm-kubernetes-setup
Fixed NGINX failures on head nodes when Kubernetes is uninstalled with multiple Kubernetes clusters running
cm-wlm-setup#
New Features#
Added the ability to configure Nebius as a Topograph provider
Improved Slurm setup performance for clusters with tens of thousands of nodes
cm-wlm-setup now preserves Pyxis configuration with the –reinstall-pyxis option
Fixed Issues#
Fixed OpenPBS setup with cm-wlm-setup
Fixed Slurm setup with a compute node as storage when cm-mariadb is preinstalled in that node’s software image
cm-setup#
New Features#
Applied the NVIDIA design palette to TUI wizard screens in cm-setup based tools
Base View#
New Features#
Added support for deploying Kgateway from the Kubernetes wizard in Base View for Kubernetes Gateway API-based ingress routing
Added support for deploying MetalLB from the Kubernetes wizard in Base View as the load balancer for externally exposed Kgateway services
Updated the Kubernetes wizard to use “Gateway proxy” terminology in place of “Ingress Proxy” to align with the new Gateway API-based architecture
Added support for uninstalling Heimdall from the Mission Control wizard in Base View
Enabled the AHR and AJR wizards in the license information page (previously shown as “Coming Soon”)
Deprecated or Removed Features#
Removed support for installing the Ingress NGINX Controller from the Kubernetes wizard in Base View; use Kgateway with MetalLB instead
Fixed Issues#
Fixed JupyterLab vulnerabilities allowing privilege escalation via argument injection in the PyPI Extension Manager and account takeover via stored cross-site scripting
Fixed cross-site scripting vulnerabilities by updating dompurify to 3.4.1