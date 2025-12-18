This section describes how to deploy DOCA SNAP Virtio-fs as a container.

Note DOCA SNAP Virtio-fs does not come pre-installed with the BFB bundle.

Note The default virtio-blk emulation provider is set to DPU . In this mode, SNAP Virtio-fs and SNAP Virtio-blk can operate simultaneously only if they are assigned to different DPA execution units (EUs). This separation is achieved by setting the environment variable dpu_helper_core_mask=0x1fffe , which is configured in the set_environment_variables.sh script included in the SNAP Virtio-fs package.

To install the BFB on BlueField:

Copy Copied! [host] sudo bfb-install --rshim <rshimN> --bfb <image_path.bfb>

For more information, refer to "Installing Full DOCA Image on DPU" in the NVIDIA DOCA Installation Guide for Linux.

Copy Copied! [dpu] sudo /opt/mellanox/mlnx-fw-updater/mlnx_fw_updater.pl --force-fw-update

For more information, refer to "Upgrading Firmware" in the NVIDIA DOCA Installation Guide for Linux.

Note Firmware configuration may expose new emulated PCIe functions, which can be later used by the host's OS. As such, the user must make sure all exposed PCIe functions (static/hotplug) are backed by a supporting virtio-fs software configuration. Otherwise, these functions would malfunction and host behavior would be anomalous.

Clear the firmware configuration before implementing the required configuration: Copy Copied! [dpu] mst start [dpu] mlxconfig -d /dev/mst/mt41692_pciconf0 reset Verify the firmware configuration: Copy Copied! [dpu] mlxconfig -d /dev/mst/mt41692_pciconf0 query Output example: Copy Copied! mlxconfig -d /dev/mst/mt41692_pciconf0 -e query | grep VIRTIO_FS Configurations: Default Current Next Boot * VIRTIO_FS_EMULATION_ENABLE False(0) True(1) True(1) VIRTIO_FS_EMULATION_NUM_VF 0 0 0 * VIRTIO_FS_EMULATION_NUM_PF 0 2 2 VIRTIO_FS_EMU_SUBSYSTEM_VENDOR_ID 6900 6900 6900 VIRTIO_FS_EMULATION_SUBSYSTEM_ID 4186 4186 4186 * VIRTIO_FS_EMULATION_NUM_MSIX 2 3 3 The output provides 5 columns (listed from left to right): Non-default configuration marker ( * )

Firmware configuration name

Default firmware value

Current firmware value

Firmware value after reboot – shows configuration update pending system reboot To enable storage emulation options, BlueField must be set to work in internal CPU model: Copy Copied! [dpu] mlxconfig -d /dev/mst/mt41692_pciconf0 s INTERNAL_CPU_MODEL=1 PF_BAR2_ENABLE=0 Warning PF_BAR2_ENABLE is a deprecated option and must be explicitly disabled. To enable the firmware config with virtio-fs emulation PF: Copy Copied! [dpu] mlxconfig -d /dev/mst/mt41692_pciconf0 s VIRTIO_FS_EMULATION_ENABLE=1 VIRTIO_FS_EMULATION_NUM_PF=1 VIRTIO_FS_EMULATION_NUM_MSIX=3

Note For a complete list of the DOCA SNAP Virtio-fs firmware configuration options, refer to "Appendix – BlueField Firmware Configuration".

Note Power cycle is required to apply firmware configuration changes.

RoCE communication is blocked for the default interfaces of BlueField OS's (named ECPFs), mlx5_0 and mlx5_1 typically. If RoCE traffic is required, scalable functions (or SFs) must be added which are network functions which support RoCE transport.

To enable RDMA/RoCE:

Copy Copied! [dpu] mlxconfig -d /dev/mst/mt41692_pciconf0 s PER_PF_NUM_SF=1 [dpu] mlxconfig -d /dev/mst/mt41692_pciconf0 s PF_SF_BAR_SIZE=8 PF_TOTAL_SF=2 [dpu] mlxconfig -d /dev/mst/mt41692_pciconf0.1 s PF_SF_BAR_SIZE=8 PF_TOTAL_SF=2

Note This is not required when working over TCP or RDMA over InfiniBand.

Note When using 64KB page size OS, PF_SF_BAR_SIZE=10 must be configured instead of 8.





DOCA SNAP Virtio-fs supports up to 2000 total VFs on virtio-fs. The VFs may be spread between up to 4 virtio-fs PFs.

Common example: Copy Copied! [dpu] mlxconfig -d /dev/mst/mt41692_pciconf0 s SRIOV_EN=1 PER_PF_NUM_SF=1 LINK_TYPE_P1=2 LINK_TYPE_P2=2 PF_TOTAL_SF=1 PF_SF_BAR_SIZE=8

Virtio-fs 250 VFs example (2 queue per VF): Copy Copied! [dpu] mlxconfig -d /dev/mst/mt41692_pciconf0 s VIRTIO_FS_EMULATION_ENABLE=1 VIRTIO_FS_EMULATION_NUM_VF=125 VIRTIO_FS_EMULATION_NUM_PF=2 VIRTIO_FS_EMULATION_NUM_MSIX=5 VIRTIO_FS_EMULATION_NUM_VF_MSIX=6

When PCIe switch emulation is enabled, BlueField can support PCI_SWITCH_EMULATION_NUM_PORT -1 hotplug virtio-fs function. These PCIe functions are shared among all BlueField users and applications and may hold hot-plugged devices of type NVMe, virtio-blk, virtio-fs , and more (e.g., virtio-net).

To enable PCIe switch emulation and configure 31 hot-plugged ports to be used, run:

Copy Copied! [dpu] mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_SWITCH_EMULATION_ENABLE=1 PCI_SWITCH_EMULATION_NUM_PORT=32

PCI_SWITCH_EMULATION_NUM_PORT equals 1 plus the number of hot-plugged PCIe functions.

Note On AMD machines, hotplug is not guaranteed to work and enabling PCI_SWITCH_EMULATION_ENABLE may impact SR-IOV capabilities.

The Data Path Accelerator (DPA) is an auxiliary processor designed to offload and accelerate data-path operations. It consists of a cluster of 16 cores, each containing 16 Execution Units (EUs).

Total Capacity: 256 EUs (16 cores × 16 EUs)

SNAP Allocation: 170 EUs are specifically available to SNAP (which runs a DPA application to accelerate the virtio-fs protocol).

The YAML-Based DPA Execution Unit Management Tool is the default mechanism for controlling DPA EUs. For the standard setup, refer to the DPA Resource Management Default Configuration.

If other DPA applications (e.g., virtio-net ) are running concurrently with SNAP, you must explicitly configure the DPA resource YAML file to allocate specific EUs to each application.

Note For more details, see Single Point of Resource Distribution.

SNAP supports reserving DPA EUs for virtio-fs controllers. By default, all available EUs (0–170) are shared among all DPA applications on the system, including virtio-fs and virtio-blk .

DPA EU allocation is managed via a YAML-based resource file. For more details, see the documentation on Single Point of Resource Distribution.

This method centralizes and enforces consistent EU allocation across applications.

Requirements:

Application name in the YAML file must match SNAP's DPA app: doca_devemu_virtio_dpa_app → for virtio-fs

At least one EU must be allocated for the virtio-fs DPA application.

EU IDs must be in the range 1–170 (EU 0 is reserved).

EU allocations must not overlap between applications.

EU groups are not supported.

SNAP's DPA application must run on the ROOT partition. EUs allocated to other partitions are not available.

Default YAML input format for SNAP Collapse Source Copy Copied! --- version: 25.04 --- DPA_APPS: doca_devemu_virtio_dpa_app: - partition: ROOT affinity_EUs: [ 17 - 169 ] dpa_helper: - partition: ROOT affinity_EUs: [ 1 - 16 ] dpa_virtq_split: - partition: ROOT affinity_EUs: [ 1 - 169 ] dpa_nvme: - partition: ROOT affinity_EUs: [ 1 - 169 ]





Generate the output YAML file using the dpa-resource-mgmt tool: dpa-resource-mgmt Collapse Source Copy Copied! dpa-resource-mgmt config -d mlx5_0 -f ~/DPA_RESOURCE_INPUT.yaml Set the DEVEMU_DPA_RESOURCES_FILE_PATH environment variable to point to the generated YAML file dpa-resource-mgmt Collapse Source Copy Copied! export DEVEMU_DPA_RESOURCES_FILE_PATH=~/ROOT.yaml Note If running in a container, ensure the YAML file is exposed to the container (e.g., using a shared folder like /etc/nvda_snap ).

Notes:

Do not manually edit the YAML file generated by dpa-resource-mgmt .

Each DPA EU supports up to 128 queues (threads).

SNAP DPA applications only operate on the ROOT partition.

The NVIDIA DOCA host package includes an optimized and extended version of the Virtio-fs driver; it provides better performance and additional features when compared to the upstream package.

Optimized performance using Virtio-fs multi-queue with a better locking design.

Support for a notification queue, allowing for a more accurate cached view of the filesystem by the host when remote changes happen.

Support for GPU Direct Storage, allowing for zero-copy transfers between storage devices and GPUs.

Note Virtio-fs DOCA host package supports FLR only with Ubuntu 25.04 DOCA Host optimized virtiofs module support starts at kernel version 6.6

Optimized virtio-fs driver support is available for:

Ubuntu 24.04

Ubuntu 25.04 (with FLR)

RHEL 10

CentOS 10 (same package as RHEL10)

OpenEuler 24.03

Note Debian does not support the optimized driver. Debian users have to fallback on the inbox drivers (no FLR, no notification queue).





To install DOCA host, install the following package:

Install the doca repo Copy Copied! [host] apt install doca-host_<version>-ubuntu2404_amd64.deb Update the package cache list: Copy Copied! [host] apt update Install NVIDIA virtio-fs: Copy Copied! [host] apt install virtiofs-dkms

Note Install OFED version 25.04-0.2.3.0 or later.

The virtio-fs DOCA host driver takes over your system version of the virtio-fs driver. It does not affect your system version of FUSE. Existing FUSE-based application should run unaffected.

Once installed, configure your DPU to run a SNAP Virtio-fs service and reboot the host. The host system will be stuck during the boot process until the SNAP Virtio-fs service becomes available.

You can confirm you are running Virtio-fs DOCA Host driver by looking in your kernel log for mentions of it:

Copy Copied! [host] dmesg | grep 'virtio-fs' virtio-fs: Loading NVIDIA-virtiofs +mq +lockless +nvq

You can now mount your Virtio-fs drive as usual:

Copy Copied! [host] mount -t virtiofs <tagname> /mnt/virtiofs/





Beginning with package virtiofs-dkms version 25.07-OFED.25.07.0.2.3.1 , the sysfs file used to display CPU mappings has become writable. This means user can now manually assign CPUs to queues to adjust the default mapping.

Additionally, a new file called irq_affinity is now available under procfs for each queue. This file allows the user to specify one or more CPUs to direct that queue’s interrupts accordingly. Note that user can only write to this file; to verify its impact, the user will need to manually check the system’s interrupts.

These features allow users to experiment with and optimize CPU-to-queue mappings.

Example usage:

Assign CPUs 4, 5, and 6 to queue 4 (removing them from any previous assignment)

Copy Copied! [host] echo 4,5,6 > /sys/fs/virtiofs/5/mqs/4/cpu_list

Direct queue 4’s interrupts to CPUs 4, 5, and 6

Copy Copied! [host] echo 4,5,6 > /sys/fs/virtiofs/5/mqs/4/irq_affinity





If you wish to remove the Virtio-fs DOCA Host driver, you can do so with the following command. It will also remove packages automatically installed as dependencies:

Copy Copied! [host] apt purge --autoremove virtiofs-dkms

DOCA SNAP Virtio-fs container is available on the DOCA SNAP Virtio-fs NVIDIA NGC page.

To deploy DOCA SNAP Virtio-fs container on top of BlueField, the following procedure is required:

Setup preparation and DOCA SNAP Virtio-fs resource download for container deployment. See section "Preparation Steps" for details. Adjust the doca_vfs.yaml for advanced configuration if needed according to section "Adjusting YAML Configuration". Deploy the container. The image is automatically pulled from NGC. See section "Spawning DOCA SNAP Virtio-fs Container" for details.

Allocate 8GiB hugepages for the DOCA SNAP Virtio-fs container according to the DPU OS's Hugepagesize value:

Query the Hugepagesize value: Copy Copied! [dpu] grep Hugepagesize /proc/meminfo In Ubuntu22 and Ubuntu24, the value should be 2048KB. In Ubuntu24 with 64k page size, the value should be 524288KB. For OS with 2048KB hugepage, use the doca-hugepages tool to configure the requested hugepages: Copy Copied! [dpu] doca-hugepages config --app snap --size 2048 --num 4096 For OS with 524288KB hugepage, use the doca-hugepages tool to configure the requested hugepages: Copy Copied! [dpu] doca-hugepages config --app snap --size 524288 --num 16 Reload the hugepages configuration for all applications based on the current database settings: Copy Copied! [dpu] doca-hugepages reload

Note If live upgrade is utilized in this deployment, it is necessary to allocate twice the amount of resources listed above for the upgraded container.

Warning If other applications are running concurrently within the setup and are consuming hugepages, make sure to allocate a number of hugepages appropriate to accommodate all applications.





The folder /etc/virtiofs is used by the container for automatic configuration after deployment.

Note The default YAML configuration only mounts the /etc/virtiofs folder for exposure and sharing between the container and the BlueField. This folder is used to expose configuration files or local file backends (e.g., AIO fsdev ) from the DPU to the container.

The .yaml configuration file for the DOCA SNAP Virtio-fs container, doca_vfs.yaml , is uploaded to DOCA NGC.

Note Internet connectivity is necessary to download DOCA SNAP Virtio-fs resources.





The .yaml file can easily be edited for advanced configuration.

The DOCA SNAP Virtio-fs .yaml file is configured by default to support Ubuntu setups (i.e., Hugepagesize = 2048 kB) by using hugepages-2Mi. To support other setups, edit the hugepages section according to the relevant Hugepagesize value for the BlueField OS. For example, to support CentOS 8.x or configure Hugepagesize to 512MB: Copy Copied! limits: hugepages-512Mi: "<number-of-hugepages>Gi"

The following example edits the .yaml file to request 6G memory for the DOCA SNAP Virtio-fs container: Copy Copied! resources: requests: memory: "6Gi" limits: memory: "6Gi" Note On Ubuntu 24.04, DOCA SNAP Virtio-fs with a high number of queues requires more memory than the default configuration provides.

The following example edits the .yaml file to request 8 CPU cores for the DOCA SNAP Virtio-fs container: Copy Copied! resources: cpu: "8" limits: cpu: "8" env: - name: APP_ARGS value: "-m 0xff" Note If all BlueField-3 cores are requested, the user must verify no other containers are in conflict over CPU resources.

To automatically configure the DOCA SNAP Virtio-fs container upon deployment: Add the spdk_rpc_init.conf file under /etc/virtiofs/ . File example: Copy Copied! fsdev_ aio0 /etc/virtiofs/test virtio_fs_transport_create -t DOCA virtio_fs_transport_start -t DOCA virtio_fs_device_create --transport-name DOCA --dev-name vfsdev0 --tag docatag --fsdev aio0 --num-request-queues 1 --queue-size 32 --driver-platform x86_64 virtio_fs_doca_device_modify --dev-name vfsdev0 --manager mlx5_0 --vuid "MT2251XZ02WZVFSS0D0F3" virtio_fs_device_start --dev-name vfsdev0 Edit the .yaml file accordingly (uncomment): Copy Copied! env: - name: SPDK_RPC_INIT_CONF value: "/etc/virtiofs/spdk_rpc_init.conf" Note It is user responsibility to make sure DOCA SNAP Virtio-fs configuration matches firmware configuration. That is, an emulated controller must be opened on all existing (static/hotplug) emulated PCIe functions (either through automatic or manual configuration). A PCIe function without a supporting controller is considered malfunctioned, and host behavior with it is anomalous.



Run the Kubernetes tool:

Copy Copied! [dpu] systemctl restart containerd [dpu] systemctl restart kubelet [dpu] systemctl enable kubelet [dpu] systemctl enable containerd

Copy the updated doca_vfs.yaml file to the /etc/kubelet.d directory.

Kubelet automatically pulls the container image from NGC described in the YAML file and spawns a pod executing the container.

Copy Copied! cp doca_vfs.yaml /etc/kubelet.d/

The DOCA SNAP Virtio-fs Service starts initialization immediately, which may take a few seconds.

To verify whether DOCA SNAP Virtio-fs is running, send spdk_rpc.py spdk_get_version to confirm whether DOCA SNAP Virtio-fs is operational or still initializing.

View currently active pods, and their IDs (it might take up to 20 seconds for the pod to start):

Copy Copied! crictl pods

Example output:

Copy Copied! POD ID CREATED STATE NAME 0379ac2c4f34c About a minute ago Ready virtiofs

View currently active containers, and their IDs:

Copy Copied! crictl ps

View existing containers and their ID:

Copy Copied! crictl ps -a

Examine the logs of a given container (virtio-fs logs):

Copy Copied! crictl logs <container_id>

Examine the kubelet logs if something does not work as expected:

Copy Copied! journalctl -u kubelet

The container log file is saved automatically by Kubelet under /var/log/containers .

To stop the container, remove the .yaml file form /etc/kubelet.d/ .

To start the container, copy the .yaml file to the same path: Copy Copied! cp doca_vfs.yaml /etc/kubelet.d

To restart the container (with sig-term), use the -t (timeout) option: Copy Copied! crictl stop -t 10 <container-id>

To restart the SNAP service without restarting the entire container, the user can either use the supervictl tool or manually terminate the SNAP service process on the DPU. Different termination signals trigger different behaviors. For example, using pkill with the -9 option sends a SIGKILL , which forcefully stops the process: Copy Copied! pkill -9 -f virtiofs

Info After containers in a pod exit, the Kubelet restarts them using an exponential back-off strategy (e.g., 10s, 20s, 40s), with the delay capped at five minutes. If a container runs successfully for 10 minutes, the Kubelet resets the restart back-off timer for that container.

Info The termination of the virtiofs service may take time, as it must release all allocated resources. The duration depends on the scale of the use case and whether other applications are sharing resources with SNAP. Kubelt may display errors if the container termination timeout is shorter than the actual time required for cleanup.

The DOCA SNAP virtio-fs container, along with its associated packages, natively supports DOCA SNAP-4 , which is implemented as an SPDK subsystem module. This design enables the concurrent operation of both virtio-fs, virtio-blk and NVMe as a unified service. Additionally, DOCA SNAP is integrated as part of the DOCA SNAP virtio-fs deployment.

DOCA SNAP deployment sets snap , snap_nvme , and snap_vblk as SPDK subsystems which can be disabled as needed.

Info Refer to DOCA SNAP-4 Service Guide documentation for more information.

Note DOCA SNAP RPCs can be used as an SPDK plugin, the recommended method for running RPCs with the SPDK RPC script. Users may need to set the PYTHONPATH environment variable to include the path to snap_rpc.py . This command creates a Virtio block controller using the DOCA SNAP RPC plugin, specifying --pf_id 0 and using Null0 as the block device: Copy Copied! spdk_rpc.py --plugin snap_rpc virtio_blk_controller_create --pf_id 0 --bdev Null0 For further details on using RPC plugins, refer to the SPDK official documentation.



