What can I help you with?
DOCA Documentation v2.10.0

SNAP-4 Service Release Notes

The release note pages provide information for NVIDIA® BlueField®-3 SNAP software such as changes and new features, software known issues, and bug fixes.

Key features in NVIDIA SNAP 4.6.0:

  • Improved application crash recovery downtime when using the virtio-blk protocol

  • SNAP recovery JSON configuration files for virtio-blk protocol

  • Introduction of a new plugins concept (plugin advanced feature - link)

  • Bug fixes

Release notes in NVIDIA SNAP 4.6.0:

  • DPU Virtio-blk data path provider as default.

Ref #

Issue

4244085

Description: Deleting the controller of the VF on DPU side causes SNAP to crash when device on host unbinds the driver

Keywords:

Discovered in version: 4.6.0

4212912

Description: serial_number is not correct with virtio_blk_controller_bdev_attach

Keywords:

Discovered in version: 4.6.0

4196047

Description: Failed to get MSIX_EQ when do hot plug / unplug

Keywords:

Discovered in version: 4.6.0

4176109

Description: msix number is less than queue number and no way to modify it

Keywords:

Discovered in version: 4.6.0

SNAP Issues

The following are known limitations of this NVMe/virtio-blk SNAP software version.

Ref #

Issue

3631346

Description: When using dynamic MSIX with NVMe protocol, the free_queues PF controller property (as described in s ection "SR-IOV Dynamic MSIX Management") is not valid and always shows 0.

Workaround: Ignore the value and assume the free_queues pool is large enough.

Keywords: NVMe; MSIX

Discovered in version: 4.4.1

-

Description: The SPDK bdev_uring is not supported. It will be supported next release.

Workaround: N/A

Keywords: NVMe

Discovered in version: 4.4.0

3817040

Description: When running nvme_controller_suspend RPC with the --timeout parameter, if timeout expires, the device is no longer operational and cannot be resumed.

Workaround: Destroy and re-create the controller.

Keywords: NVMe

Discovered in version: 4.4.0

3809646

Description: When working with a new DPA provider, when sending DMA followed by an interrupt to DPA, it wakes up before DMA is written to the buffer causing DPA to miss events.

Workaround: Add a software-based periodic wake-up mechanism.

Keywords: NVMe

Discovered in version: 4.4.0

3773346

Description: In virtio-blk controller configuration, when running with SPDK NVMe-oF initiator as a backend, an unaligned size_max value may cause memory corruption.

Workaround: size_max and seg_max values must be a power of 2.

Keywords: Virtio-blk; NVMe-oF; spdk

Discovered in version: 4.3.1

3745842

Description: When running with NVMe/TCP SPDK block device as a backend, SNAP cannot work over more than 8 cores.

Workaround: Work with Arm core mask which uses only 8 cores.

Keywords: NVMe; TCP; SPDK

Discovered in version: 4.3.1

-

Description: The container image may becomes corrupted, resulting in the container status showing as exited with the error message /usr/bin/supervisord: exec format error.

Workaround: Remove the YAML from kubelet, use crictl images to list the images and crictl rmi <image-id> to remove the image. Run systemctl restart containerd and systemctl restart kubelet, then copy the YAML file again to kubelet.

Keywords: NGC; container image

Discovered in version: 4.3.1

3757171

Description: When running virtio-blk emulation with large IOs (>128K) and SPDK's nvmf initiator as a backend, IOs may fail in SPDK layer due to bad alignment.

Workaround: size_max value of virtio_blk_controller_create RPC must be set and be a power of 2.

Keywords: SPDK, virtio-blk, size_max

Discovered in version: 4.3.1

3689918

3753637

Description: SNAP container bring-up takes a long time when configured with a large number of emulations, possibly taking longer than the default NVMe driver timeout.

Workaround: Increase NVMe driver IO timeout to 300 seconds (instead of 30).

Keywords: NVMe; recovery; kernel driver

Discovered in version: 4.3.0

-

Description: NVMeTCP XLIO is currently not supported when running 64K page size kernels on the DPU Arm cores (as is the case for CentOS 8.x, Rocky 8.x, or openEuler 20.x).

Workaround: N/A

Keywords: 64K page size; NVMeTCP XLIO

Discovered in version: 4.1.0

3264154

Description: NVMeTCP XLIO is not supported when running 64K page size kernels on the DPU Arm cores (such is the case with CentOS 8.x, Rocky 8.x, or openEuler 20.x).

Workaround: N/A

Keywords: Page size; NVMeTCP XLIO

Discovered in version: 4.1.0

-

Description: NVMe over RDMA full offload is not supported.

Workaround: N/A

Keywords: NVMe over RDMA; support

Discovered in version: 4.0.0

4110943

Description: Hot-unplugging a hotplugged Virtio BLK device is not allowed unless a Virtio BLK controller has previously been created for the device.

Workaround: Create a Virtio BLK controller on the device which need to do hotunplug

Keywords: hotplug, hotunplug

Discovered in version: 4.5.0

4104709

Description: Some legacy operating systems (e.g., RockyLinux with kernel 4.18) issues virtio-blk zero-length I/Os during boot (e.g., during EDD probing).

Workaround: Set VIRTIO_BLK_ZERO_LEN_IO_FAIL=1 environment variable to configure SNAP to fail zero-length I/Os in virtio-blk.

Keywords: virtio-blk, zero-length I/Os

Reported in version: 4.6.0

-

Description: When using both NVMe and Virtio-blk protocols in SNAP, their data providers may share the same DPA harts, potentially causing NVMe configuration or performance issues. This is especially relevant when Virtio-blk is in DPU mode and NVMe is in DPA mode.

Workaround: Use dpa_helper_core_mask and dpa_nvme_core_mask environment variable as DPA core masks, for example dpa_helper_core_mask=0x0000FFFF and dpa_nvme_core_mask=0xFFFF0000,

Keywords: Virtio-blk and NVMe, DPA core mask.

Reported in version: 4.6.0

4094152

Description: When one (or more) of the NVMe OCI functiuons is undergoing FLR, nvme_controller_list RPC may fail.

Workaround: Retry nvme_controller_list RPC until successful,

Keywords: FLR .

Reported in version: 4.6.0


OS/vendor Issues

Info

The following are not BlueField SNAP limitations.

Ref #

Issue

-

Description: Some old Windows OS NVMe drivers have buggy usage of SGL support.

Workaround: Disable SGL support when using Windows OS by setting the --quirks bit 4 to 1 in snap_rpc.py nvme_controller_create RPC.

Keywords: Windows; NVMe

Reported in version: 4.4.0

2879262

Description: When the virtio-blk kernel driver cannot find enough MSI-X vectors to satisfy all its opened virtqueues, it failovers to assign a single MSI-X vector to all virtqueues which negatively impacts performance. In addition, when a large number (e.g., 64) of virtqueues are associated with a single MSI-X, the kernel may enter a soft-lockup (kernel bug) and the IO will hang.

Workaround: Always keep num_queues < num_msix. Best practice is to not set --num_queues at all when creating virtio-blk controllers, and the best-suited value is automatically chosen based on available MSI-X.

Keywords: Virtio-blk; kernel driver; MSI-X

Reported in version: 4.3.0

-

Description: If PCIe devices are inserted prior to the hot-plug driver being loaded on host, the hot-plug driver in kernel version less than 4.19 does not enable the slot even if the slot is occupied (i.e., presence detected in slot status register). That is, only the presence state of the slot is changed by firmware but the PCIe slot is not enabled by the kernel after host bootup (i.e.,

So that we can't get the PCIe device by lspci on host side, and the bdf is 0 on controller.

Workaround: Add pciehp.pciehp_force=1 to the boot command line on host.

Keywords: Virtio-blk; kernel driver; hot-plug

Reported in version: 4.2.1

-

Description: RedHat/Centos 7.x does not handle "online" (post driver probe) namespace additions/removals correctly.

Workaround: Use --quirks=0x2 option in snap_rpc.py nvme_controller_create.

Keywords: NVMe; CentOS; RedHat; kernel

Reported in version: 4.1.0

-

Description: Some Windows drivers have experimental support for "online" (post driver probe) namespace additions/removal, although such support is not communicated with the device.

Workaround: Use --quirks=0x1 option in snap_rpc.py nvme_controller_create.

Keywords: NVMe; Windows

Reported in version: 4.1.0

-

Description: VMWare ESXi supports "online" (post driver probe) namespace additions/removal, only if “Namespace Management” is supported by controller.

Workaround: Use --quirks=0x8 option in snap_rpc.py nvme_controller_create.

Keywords: NVMe, ESXi

Reported in version: 4.1.0

-

Description: Ubuntu 22.04 does not support 500 VFs.

Workaround: N/A

Keywords: Virtio-blk; kernel driver; Ubuntu 22.04

Reported in version: 4.1.0

-

Description: Virtio-blk Linux kernel driver does not handle PCIe FLR events.

Workaround: N/A

Keywords: Virtio-blk; kernel driver

Reported in version: 4.0.0

3679373

Description: Virtio-blk spdk driver (vfio-pci based) does not handle PCIe FLR events.

Workaround: N/A

Keywords: Virtio-blk; SPDK driver

Reported in version: 4.3.0

-

Description: A n ew virtio-blk Linux kernel driver (starting kernel 4.18) does not support hot-unplug during traffic. Since the kernel may self-generate spontaneous IOs, on rare occasions, an issue may happen even when no traffic is explicitly being run.

Workaround: N/A

Keywords: Virtio-blk; kernel driver

Reported in version: 4.0.0

Description: SPDK NVMf/RDMA initiator fails to connect to kernel NVMf/RDMA remote target.

Workaround: Use setting spdk_rpc.py bdev_nvme_set_options --io-queue-requests=128 on SPDK configuration

Keywords: SPDK, NVMf, RDMA, kernel

Reported in version: 4.3.1

-

Description: Windows OS virtio-blk driver expects at least 64K data to be available for a single IO request

Workaround: Use seg_max and size_max parameters configuration to match requirements (`seg_max * size_max > 64K`).

Keywords: Windows, virtio-blk

Reported in version: 4.3.1

-

Description: Some old Windows OS versions have malfunctioning inbox virtio-blk driver, expects a 3-party virtio-blk driver to be pre-installed to operate properly.

Workaround: Use verified 3-party driver published by fedora (link).

Keywords: Windows, virtio-blk

Reported in version: 4.3.1

-

Description: When using hotplugged PCIe devices, after all devices are plugged, the host must be rebooted for Windows to detect all devices (some Windows versions will perform reboot automatically).

Workaround: N/A

Keywords: Hotplug, Windows.

Reported in version: 4.5.0

3748674

Description: On most modern Linux distributions, unplugging a PCIe function from the host while there are inflight I/Os can cause the virtio-blk driver to hang.

Workaround: N/A

Keywords: Hotplug, Linux.

Reported in version: 4.5.0

4158322

Description: Windows OS does not support online PCIe rescan, requiring a system reboot after each PCIe hotplug operation. Without a reboot, the OS cannot detect subsequent hotplug operations, causing additional hotplug attempts to fail.

Workaround: Wait for the Windows OS to complete the reboot process before submitting another virtio_blk_controller_hotplug RPC to expose an additional hotplug device.

Keywords: Windows OS, hotplug, response timeout

Reported in version: 4.5.0

4206444

Description: The Linux kernel driver does not restrict seg_max based on the maximum queue_size reported by the device. This can lead to a WARN_ON_ONCE() trigger in the kernel, resulting in driver misbehavior and potential system hangs.

Workaround: Ensure that all virtio-blk controllers are configured with a seg_max value smaller than (queue_size-2).

Keywords: Virtio-blk; kernel driver; seg_max; queue_size

Reported in version: 4.6.0

-

Description: When using SRIOV with VFs sharing the same driver as their PF (sriov_driver_autoprobe=1) - unprobing the driver may take a long time, and some admin commands might get timeout

Workaround: N/A

Keywords: SRIOV; driver.

Discovered in version: 3.8.0-8


© Copyright 2025, NVIDIA. Last updated on May 5, 2025.