Installing UFM Infra Using Rootless with Podman
Prerequisites
Download the UFM and plugins bundle tar file to
/tmp
.Extract the contents using the command:
tar -xvf <bundle tar>
This archive (tar file) includes the following components:
Relevant UFM container image
Relevant FAST-API container image
Relevant Infra container image (for internal Redis usage). Refer to Redis-Related Configuration for more information.
Default plugin bundle for UFM
UFM-HA package
To enable the UFM Infra feature, UFM HA must be installed in a new mode (`external-storage`), using a new product (`enterprise-multinode
`).
Additionally, NFS must be configured as follows:
NFS Setup Prerequisites
Select a dedicated NFS server to host the shared directories.
Create a shared directory on the NFS server for UFM configuration and logs.
Install the NFS client on each UFM node if not already present.
Enable HA ports in Firewall
If you have firewall rules that blocks non-standard ports, we need to open these ports so high availability services could communicate with each other on the HA nodes. To do so, run these commands:
firewall-cmd --permanent --add-service=high-availability
# or
firewall-cmd --add-service=high-availability
# and then reload the rules
firewall-cmd --reload
Create and Mount the UFM Directory
At this stage, apply point #2 (Mount the UFM directory) only on the master machine.
Other nodes will be visited for mount later.
Create the UFM directory:
mkdir -p /opt/ufm/files/
Mount the UFM directory:
If using NFS 4.2:
mount -t nfs4 -o context=
"system_u:object_r:container_file_t:s0"
<server>:/shared_folder /opt/ufm/shared_filesIf using NFS 3:
mount -t nfs -o vers=
3
,context="system_u:object_r:container_file_t:s0"
<server>:/shared_folder /opt/ufm/shared_files
Ensure the NFS version and mount options are compatible with the NFS server.
Verify that the following HA packages are installed:
pcs
,pacemaker
, andcorosync
. Install them if they are missing.Follow the HA installation steps in Run the HA Installation.
Run the HA Installation
Follow the HA installation instructions at UFM High-Availability Installation and Configuration.
When running the HA installation script, use the following command:
./install.sh -p enterprise-multinode -l /opt/ufm/shared_files
The
-l
flag must always point to the shared directory path:/opt/ufm/shared_files
No need to provide the DRBD disk argument to the installation script.
Installation Instructions
Check firewall status:
systemctl status firewalld
Configure Firewall (if active):
# check
if
firewalld is running systemctl status firewalld # Permanently add port8443
to firewalld firewall-cmd --permanent --add-port=8443
/tcp # reload firewalld config firewall-cmd --reloadCreate UFM directory:
mkdir -p /opt/ufm
Create UFM group:
groupadd ufmadm -g
733
Create a UFM user:
useradd -d /opt/ufm -m -u
733
-g ufmadm ufmadmSet directory ownership:
chown -R ufmadm:ufmadm /opt/ufm chown -R ufmadm:ufmadm /opt/ufm/shared_files
Configure SubUID and SubGID:
echo
"ufmadm:100000:65536"
>> /etc/subuid echo"ufmadm:100000:65536"
>> /etc/subgidEnable Login Linger for UFM ser:
loginctl enable-linger ufmadm
Configure Rootless Podam storage
sudo -u ufmadm mkdir -p /opt/ufm/.config/containers cat <<EOF | sudo -u ufmadm tee /opt/ufm/.config/containers/storage.conf > /dev/
null
[storage] driver ="overlay"
runroot ="/run/user/733"
EOFCreate Podman UFM socket:
cat <<EOF > /usr/lib/systemd/system/podman-ufm.socket [Unit] Description=Podman API Socket For Nvidia UFM [Socket] SocketUser=ufmadm SocketGroup=ufmadm ListenStream=%t/podman-ufm/podman-ufm.sock SocketMode=
0660
[Install] WantedBy=sockets.target EOFCreate Podman UFM service
cat <<EOF > /usr/lib/systemd/system/podman-ufm.service [Unit] Description=Podman API Service
for
Nvidia UFM Requires=podman-ufm.socket After=podman-ufm.socket StartLimitIntervalSec=0
[Service] Delegate=true
Type=exec User=ufmadm Group=ufmadm KillMode=process Environment=LOGGING="--log-level=info"
ExecStart=/usr/bin/podman \$LOGGING system service LimitMEMLOCK=infinity [Install] WantedBy=default
.target EOFCreate Podman cleanup service:
cat <<EOF > /usr/lib/systemd/system/podman-ufm-cleanup.service [Unit] Description=podman-ufm-cleanup - clean stuck rootless containers at boot After=podman-ufm.service Before=ufm-enterprise.service [Service] Type=oneshot User=ufmadm Group=ufmadm ExecStart=/usr/bin/podman system migrate [Install] WantedBy=multi-user.target EOF
Enable and start Podman services:
systemctl daemon-reload systemctl enable --now podman-ufm.socket systemctl enable --now podman-ufm.service systemctl enable --now podman-ufm-cleanup.service
Create Udev Rules for InfiniBand Devices
cat <<EOF > /etc/udev/rules.d/
70
-umad.rules KERNEL=="umad*"
, SUBSYSTEM=="infiniband_mad"
, MODE="0600"
, OWNER="ufmadm"
, GROUP="ufmadm"
KERNEL=="issm*"
, SUBSYSTEM=="infiniband_mad"
, MODE="0600"
, OWNER="ufmadm"
, GROUP="ufmadm"
EOF udevadm control --reload-rules udevadm triggerClean and create UFM directories
rm -rf /opt/ufm/systemd sudo -u ufmadm mkdir -p /opt/ufm/ufm_plugins_data sudo -u ufmadm mkdir -p /opt/ufm/systemd sudo -u ufmadm mkdir -p /opt/ufm/etc/apache2
Load UFM image and extract version:
# Extract UFM version from filename (e.g., ufm_6.
22.0
-7
.ubuntu24.x86_64-docker.img.gz -> 6_22_0_7) UFM_VERSION=$(basename"$UFM_IMAGE_FILE"
| sed's/ufm_\([0-9][^.]*\.[^.]*\.[^.]*-[^.]*\)\.ubuntu.*/\1/'
| tr'.-'
'_'
) echo"UFM Version: $UFM_VERSION"
# Load the UFM image sudo -u ufmadm podman load -i"$UFM_IMAGE_FILE"
Create version-specific directory and soft link:
# Create version-specific directory in shared storage sudo -u ufmadm mkdir -p /opt/ufm/shared_files/ufm-${UFM_VERSION} # Remove existing files link
if
it exists rm -f /opt/ufm/files # Create soft link to version-specific directory sudo -u ufmadm ln -s /opt/ufm/shared_files/ufm-${UFM_VERSION} /opt/ufm/files # Verify the soft link ls -la /opt/ufm/filesRun UFM installer:
sudo -u ufmadm podman run -it --rm --name=ufm_installer \ -v /run/podman-ufm/podman-ufm.sock:/var/run/docker.sock \ -v /opt/ufm/:/installation/ufm_files/ \ -v /opt/ufm/files:/installation/ufm_files/files \ -v /opt/ufm/systemd:/etc/systemd_files/ \ mellanox/ufm-enterprise:latest \ --install \ --fabric-
interface
ib0 \ --rootless \ --plugin-path /opt/ufm/ufm_plugins_data \ --ufm-user ufmadm \ --ufm-group ufmadm \ --ufm-infraNote**Note**: Replace `ib0` with your actual InfiniBand interface name, if it is not the default ib0.
**Note**: - All other UFM install flags are supported and can be added to the command.
Load Redis Image (if not using external Redis):
Load the given Redis image (in
case
you are not using external Redis) sudo - u ufmadm load -i"<PATH TO GIVEN REDIS IMAGE>"
Load Fast API Plugin image:
sudo - u ufmadm run --hostname $HOSTNAME --rm --name=ufm_plugin_mgmt --entrypoint=
""
\ -v /run/podman-ufm/podman-ufm.sock:/var/run/docker.sock \ -v /opt/ufm/files:/opt/ufm/shared_config_files \ -v /dev/log:/dev/log \ -v /sys/fs/cgroup:/sys/fs/cgroup:ro \ -v /lib/modules:/lib/modules:ro \ -v /opt/ufm/ufm_plugins_data:/opt/ufm/ufm_plugins_data \ -e UFM_CONTEXT=ufm-infra \ mellanox/ufm-enterprise:latest \ /opt/ufm/scripts/manage_ufm_plugins.sh add -p fast_api -t ${FAST_API_VERSION} -c ufm-infraInstall service files:
mv /opt/ufm/systemd/ufm-enterprise.service /etc/systemd/system/ufm-enterprise.service mv /opt/ufm/systemd/ufm-infra.service /etc/systemd/system/ufm-infra.service systemctl daemon-reload
To start UFM as a standalone instance, run:
systemctl daemon-reload
systemctl start ufm-infra
systemctl start ufm-enterprise
Running in HA Mode
Do not manually start any services.
Ensure UFM and UFM-HA are installed on all nodes as described in the above sections.
Mount /opt/ufm/files on all standby nodes as described point #2 (Mount the UFM directory)
On one node, edit the HA configuration file:
/etc/ufm_ha/ha_nodes.cfg
Fill each node parameters
[Node.
1
] # valid role options: master/standby role = master # Mandatory primary_ip = # Mandatoryif
dual_link =true
secondary_ip = [Node.2
] role = standby primary_ip = secondary_ip = [Node.3
] role = standby primary_ip = secondary_ip =Ensure the file sync mode is set to
external-storage
, and that the shared file system is mounted prior to HA configuration.[FileSync] # valid options are: drbd/external-storage # in
case
of external-storage the user MUST mount the files system PRIOR to ha configuration mode = external-storageCopy the edited file to all nodes at the same path.
Configure the cluster, starting from standby nodes and ending with the master node:
ufm_ha_cluster config -p <password>
NoteUse the same password on all nodes.
After finishing the configuration on all nodes, run:
ufm_ha_cluster status
Start the cluster:
ufm_ha_cluster start
Check cluster status again to ensure all services have started successfully.