Release notes for NVIDIA Base Command™ Manager (BCM) 11.25.08
Released: 1 October 2025
General
New Features
Updated Slurm 25.05 to 25.05.2
Updated Slurm 24.11 to 24.11.6
Updated Topograph to 3.2.0
Updated etcd to 3.5.22
Updated cm-nvhpc to 25.5
Added cm-kubeadm-manage-joins helper script
Fixed Issues
Fixed potential killing of slurmdbd by systemd
NVIDIA Mission Control
New Features
Support for NVIDIA Mission Control (NMC) 2.0
Updated run:ai to 2.22
Bumped Run:ai self-hosted helm chart to 40m timeout (20m was not enough)
Run:ai cluster installation (helm install) increased max. timeout to 20 minutes (instead of default)
Do not check if Kyverno is disabled during AHR (Autonomous Hardware Recovery) setup
Run:ai wizard now includes the correct mpi / kubeflow CRDs
Fixed Issues
Run:ai cluster installation stage will now install always on the correct kube cluster (in case of multiple kubernetes cluster on a single BCM cluster)
Long hostnames could visually render incorrect TUI dialogs in the Run:ai configuration dialog.
Run:ai self-hosted allows for custom version selection and latest patch release versions
CMDaemon
New Features
Standardize on array result for rest/sysinfo
Fixed an issue with changing cluster UUID not updating in BCM
Ensure we add the KubernetesCertsExpiration healthcheck to existing deployments
CMDaemon runs scontrol reconfig once for both slurm.conf and topology.conf updates
Fixed an issue with labeled entities not being listed in the base-view monitoring view
BCM can set node-role.kubernetes.io/runai-gpu-worker=true for GPU nodes (and cpu worker variant).
Opt-in through BCM GlobalConfig to disable writing out root kubeconfig configuration files on kubernetes worker nodes.
Fixed Issues
Make sure hostname changes are actually host name changes + restore etcd:etcd ownership after restoring pki backup (done before and after kubeadm reset)
Introduce new KubernetesCertsExpiration healthcheck, that warns if nodes have expiring certificates (within 30 days)
Fixed regression in KubernetesChildNode healthcheck (and others)
Extend v1/rest/status to include rack and system name
cm-kubeadm-manage helper script should never overwrite etcd certificates handled by BCM
Fixed regressions in metrics collection scripts in Kubernetes
Only invoke kubeadm kubeconfig user commands when actually needed
COD
New Features
ALL: Change default BCM version to 11.0.
ALL: Create inbound rules for both protocols (TCP/UDP) if the user didn’t explicitly request a protocol.
AWS: Preserve internal IP address when recovering broken AWS HA head node.
AWS: Create placement group with spread strategy for better HA head node separation.
Azure: Added support for NAT gateway for outbound connectivity.
GCP: Add cloudsetting network_performance_config.
GCP: Always delete empty subnets in BCM created networks.
OCI: Ensure the VM architecture is compatible with the requested image.
Fixed Issues
ALL: Improve error on missing architecture config.
ALL: Preserve SSH host keys on cm-cloud-ha-setup.
ALL: HA setup checks that head node IP is within CIDR.
ALL: Harmonize –cluster-tags and –head-node-tags cluster create flags.
ALL: Updated image search defaults and validation. By default list the latest image. Flag –all-revisions overrides –latest. When both –latest and –all-revisions are used, an info message is logged.
Azure: Powering on multiple nodes could raise an error while creating several VHDs at the same time.
GCP: Fix cryptic invalid refresh token grant errors.
GCP: Set correct instance hostname on cluster create.
GCP: Reduce API requests on cluster delete.
GCP: Fix timeout creating HA shared storage.
GCP: Optimize cluster list.
GCP: Improve reliability of cluster delete.
OCI: Fixed cluster create hang when non-existing region is specified.
cm-kubernetes-setup
New Features
Started using containerd v2 by default for new Kubernetes setups
Updated Kubeflow training operator to 1.9.2
Remove validation for NetQ and Kubernetes version Mapping Only for NetQ version 4.15.0
The wizard now allows choosing ‘toolkit.enabled=true’, delegating the install + containerd configuration to the NVIDIA GPU Operator
Fixed Issues
Fixed kube_get_available_nodes remote_request RPC call to also consider head nodes for Kubernetes workers
CAPI templates (stored in the kubernetes submode) should not be uninstallable via cm-kubernetes-setup.
Fixed showing MetalLB/BGP screen in cm-kubernete-setup when Tigera operator is selected
cm-wlm-setup
Fixed Issues
Fixed adding pyxis plugin to secondary software images on multi-arch/multi-distro setups