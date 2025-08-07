To enable the UFM Infra feature, UFM HA must be installed in a new mode (`external-storage`), using a new product (` enterprise-multinode `).

Additionally, NFS must be configured as follows:

Select a dedicated NFS server to host the shared directories.

Create a shared directory on the NFS server for UFM configuration and logs.

Install the NFS client on each UFM node if not already present.

If you have firewall rules that blocks non-standard ports, we need to open these ports so high availability services could communicate with each other on the HA nodes. To do so, run these commands:

Copy Copied! firewall-cmd --permanent --add-service=high-availability # or firewall-cmd --add-service=high-availability # and then reload the rules firewall-cmd --reload





Note At this stage, apply point #2 (Mount the UFM directory) only on the master machine. Other nodes will be visited for mount later.

Create the UFM directory: Copy Copied! mkdir -p /opt/ufm/files/ Mount the UFM directory: If using NFS 4.2: Copy Copied! mount -t nfs4 -o context= "system_u:object_r:container_file_t:s0" <server>:/shared_folder /opt/ufm/shared_files

If using NFS 3: Copy Copied! mount -t nfs -o vers= 3 ,context= "system_u:object_r:container_file_t:s0" <server>:/shared_folder /opt/ufm/shared_files Ensure the NFS version and mount options are compatible with the NFS server. Verify that the following HA packages are installed: pcs , pacemaker , and corosync . Install them if they are missing. Follow the HA installation steps in Run the HA Installation.

Follow the HA installation instructions at UFM High-Availability Installation and Configuration.

When running the HA installation script, use the following command:

Copy Copied! ./install.sh -p enterprise-multinode -l /opt/ufm/shared_files

The -l flag must always point to the shared directory path: /opt/ufm/shared_files

No need to provide the DRBD disk argument to the installation script.

Check firewall status: Copy Copied! systemctl status firewalld Configure Firewall (if active): Copy Copied! # check if firewalld is running systemctl status firewalld # Permanently add port 8443 to firewalld firewall-cmd --permanent --add-port= 8443 /tcp # reload firewalld config firewall-cmd --reload Create UFM directory: Copy Copied! mkdir -p /opt/ufm Create UFM group: Copy Copied! groupadd ufmadm -g 733 Create a UFM user: Copy Copied! useradd -d /opt/ufm -m -u 733 -g ufmadm ufmadm Set directory ownership: Copy Copied! chown -R ufmadm:ufmadm /opt/ufm chown -R ufmadm:ufmadm /opt/ufm/shared_files Configure SubUID and SubGID: Copy Copied! echo "ufmadm:100000:65536" >> /etc/subuid echo "ufmadm:100000:65536" >> /etc/subgid Enable Login Linger for UFM ser: Copy Copied! loginctl enable-linger ufmadm Configure Rootless Podam storage Copy Copied! sudo -u ufmadm mkdir -p /opt/ufm/.config/containers cat <<EOF | sudo -u ufmadm tee /opt/ufm/.config/containers/storage.conf > /dev/ null [storage] driver = "overlay" runroot = "/run/user/733" EOF Create Podman UFM socket: Copy Copied! cat <<EOF > /usr/lib/systemd/system/podman-ufm.socket [Unit] Description=Podman API Socket For Nvidia UFM [Socket] SocketUser=ufmadm SocketGroup=ufmadm ListenStream=%t/podman-ufm/podman-ufm.sock SocketMode= 0660 [Install] WantedBy=sockets.target EOF Create Podman UFM service Copy Copied! cat <<EOF > /usr/lib/systemd/system/podman-ufm.service [Unit] Description=Podman API Service for Nvidia UFM Requires=podman-ufm.socket After=podman-ufm.socket StartLimitIntervalSec= 0 [Service] Delegate= true Type=exec User=ufmadm Group=ufmadm KillMode=process Environment=LOGGING= "--log-level=info" ExecStart=/usr/bin/podman \$LOGGING system service LimitMEMLOCK=infinity [Install] WantedBy= default .target EOF Create Podman cleanup service: Copy Copied! cat <<EOF > /usr/lib/systemd/system/podman-ufm-cleanup.service [Unit] Description=podman-ufm-cleanup - clean stuck rootless containers at boot After=podman-ufm.service Before=ufm-enterprise.service [Service] Type=oneshot User=ufmadm Group=ufmadm ExecStart=/usr/bin/podman system migrate [Install] WantedBy=multi-user.target EOF Enable and start Podman services: Copy Copied! systemctl daemon-reload systemctl enable --now podman-ufm.socket systemctl enable --now podman-ufm.service systemctl enable --now podman-ufm-cleanup.service Create Udev Rules for InfiniBand Devices Copy Copied! cat <<EOF > /etc/udev/rules.d/ 70 -umad.rules KERNEL== "umad*" , SUBSYSTEM== "infiniband_mad" , MODE= "0600" , OWNER= "ufmadm" , GROUP= "ufmadm" KERNEL== "issm*" , SUBSYSTEM== "infiniband_mad" , MODE= "0600" , OWNER= "ufmadm" , GROUP= "ufmadm" EOF udevadm control --reload-rules udevadm trigger Clean and create UFM directories Copy Copied! rm -rf /opt/ufm/systemd sudo -u ufmadm mkdir -p /opt/ufm/ufm_plugins_data sudo -u ufmadm mkdir -p /opt/ufm/systemd sudo -u ufmadm mkdir -p /opt/ufm/etc/apache2 Load UFM image and extract version: Copy Copied! # Extract UFM version from filename (e.g., ufm_6. 22.0 - 7 .ubuntu24.x86_64-docker.img.gz -> 6_22_0_7) UFM_VERSION=$(basename "$UFM_IMAGE_FILE" | sed 's/ufm_\([0-9][^.]*\.[^.]*\.[^.]*-[^.]*\)\.ubuntu.*/\1/' | tr '.-' '_' ) echo "UFM Version: $UFM_VERSION" # Load the UFM image sudo -u ufmadm podman load -i "$UFM_IMAGE_FILE" Create version-specific directory and soft link: Copy Copied! # Create version-specific directory in shared storage sudo -u ufmadm mkdir -p /opt/ufm/shared_files/ufm-${UFM_VERSION} # Remove existing files link if it exists rm -f /opt/ufm/files # Create soft link to version-specific directory sudo -u ufmadm ln -s /opt/ufm/shared_files/ufm-${UFM_VERSION} /opt/ufm/files # Verify the soft link ls -la /opt/ufm/files Run UFM installer: Copy Copied! sudo -u ufmadm podman run -it --rm --name=ufm_installer \ -v /run/podman-ufm/podman-ufm.sock:/var/run/docker.sock \ -v /opt/ufm/:/installation/ufm_files/ \ -v /opt/ufm/files:/installation/ufm_files/files \ -v /opt/ufm/systemd:/etc/systemd_files/ \ mellanox/ufm-enterprise:latest \ --install \ --fabric- interface ib0 \ --rootless \ --plugin-path /opt/ufm/ufm_plugins_data \ --ufm-user ufmadm \ --ufm-group ufmadm \ --ufm-infra Note **Note**: Replace `ib0` with your actual InfiniBand interface name, if it is not the default ib0. **Note**: - All other UFM install flags are supported and can be added to the command. Load Redis Image (if not using external Redis): Copy Copied! Load the given Redis image (in case you are not using external Redis) sudo - u ufmadm load -i "<PATH TO GIVEN REDIS IMAGE>" Load Fast API Plugin image: Copy Copied! sudo - u ufmadm run --hostname $HOSTNAME --rm --name=ufm_plugin_mgmt --entrypoint= "" \ -v /run/podman-ufm/podman-ufm.sock:/var/run/docker.sock \ -v /opt/ufm/files:/opt/ufm/shared_config_files \ -v /dev/log:/dev/log \ -v /sys/fs/cgroup:/sys/fs/cgroup:ro \ -v /lib/modules:/lib/modules:ro \ -v /opt/ufm/ufm_plugins_data:/opt/ufm/ufm_plugins_data \ -e UFM_CONTEXT=ufm-infra \ mellanox/ufm-enterprise:latest \ /opt/ufm/scripts/manage_ufm_plugins.sh add -p fast_api -t ${FAST_API_VERSION} -c ufm-infra Install service files: Copy Copied! mv /opt/ufm/systemd/ufm-enterprise.service /etc/systemd/system/ufm-enterprise.service mv /opt/ufm/systemd/ufm-infra.service /etc/systemd/system/ufm-infra.service systemctl daemon-reload

To start UFM as a standalone instance, run:

Copy Copied! systemctl daemon-reload systemctl start ufm-infra systemctl start ufm-enterprise

Note Do not manually start any services.