NVIDIA BlueField-3 SNAP for NVMe and Virtio-blk v4.3.1
1.0

Known Issues

The following are known limitations of this NVMe/virtio-blk SNAP software version.

Ref #

Issue

3689918

Description: SNAP container bring-up takes a long time when configured with a large number of emulations, possibly taking longer than the default NVMe driver timeout.

Workaround: Increase NVMe driver IO timeout to 300 seconds (instead of 30).

Keywords: NVMe; recovery; kernel driver

Discovered in version: 4.3.0

Description: NVMeTCP XLIO is currently not supported when running 64K page size kernels on the DPU Arm cores (as is the case for CentOS 8.x, Rocky 8.x, or openEuler 20.x).

Workaround: N/A

Keywords: 64K page size; NVMeTCP XLIO

Discovered in version: 4.1.0

3264154

Description: NVMeTCP XLIO is not supported when running 64K page size kernels on the DPU Arm cores (such is the case with CentOS 8.x, Rocky 8.x, or openEuler 20.x).

Workaround: N/A

Keywords: Page size; NVMeTCP XLIO

Discovered in version: 4.1.0

Description: NVMe over RDMA full offload is not supported.

Workaround: N/A

Keywords: NVMe over RDMA; support

Discovered in version: 4.0.0

Description: SNAP is not supported on a host with Windows OS.

Workaround: N/A

Keywords: Windows; OS; support

Discovered in version: 4.0.0

3757171

Description: When running virtio-blk emulation with large IOs (>128K) and SPDK's nvmf initiator as a backend, IOs may fail in SPDK layer due to bad alignment.

Workaround: size_max value of virtio_blk_controller_create RPC must be set and be a power of 2.

Keywords: SPDK, virtio-blk, size_max

Discovered in version: 4.3.1

-

Description: Container image becomes corrupted, resulting in the container status showing as "exited." error message "/usr/bin/supervisord: exec format error" is displayed.

Workaround: Remove the YAML from kubelet, use "crictl images" to list the images and "crictl rmi <image-id>" to remove the image. Run "systemctl restart containerd" and "systemctl restart kubelet" and copy the YAML file again to kubelet.

Keywords: NGC, container image.

Discovered in version: 4.3.1

3745842

Description: When running with NVMeTCP spdk block device as a backend, SNAP cannot work over more than 8 cores.

Workaround: Work with ARM core mask which uses only 8 cores

Keywords: NVMe, TCP, SPDK

Discovered in version: 4.3.1

Note

The following are not BlueField SNAP limitations.

Ref #

Issue

2879262

Description: When the virtio-blk kernel driver cannot find enough MSI-X vectors to satisfy all its opened virtqueues, it failovers to assign a single MSI-X vector to all virtqueues which negatively impacts performance. In addition, when a large number (e.g., 64) of virtqueues are associated with a single MSI-X, the kernel may enter a soft-lockup (kernel bug) and the IO will hang.

Workaround: Always keep num_queues < num_msix. Best practice is to not set --num_queues at all when creating virtio-blk controllers, and the best-suited value is automatically chosen based on available MSI-X.

Keywords: Virtio-blk; kernel driver; MSI-X

Reported in version: 4.3.0

-

Description: If PCIe devices are inserted prior to the hot-plug driver being loaded on host, the hot-plug driver in kernel version less than 4.19 does not enable the slot even if the slot is occupied (i.e., presence detected in slot status register). That is, only the presence state of the slot is changed by firmware but the PCIe slot is not enabled by the kernel after host bootup (i.e.,

So that we can't get the PCIe device by lspci on host side, and the bdf is 0 on controller.

Workaround: Add pciehp.pciehp_force=1 to the boot command line on host.

Keywords: Virtio-blk; kernel driver; hot-plug

Reported in version: 4.2.1

-

Description: RedHat/Centos 7.x does not handle "online" (post driver probe) namespace additions/removals correctly.

Workaround: Use --quirks=0x2 option in snap_rpc.py nvme_controller_create.

Keywords: NVMe; CentOS; RedHat; kernel

Reported in version: 4.1.0

-

Description: Some Windows drivers have experimental support for "online" (post driver probe) namespace additions/removal, although such support is not communicated with the device.

Workaround: Use --quirks=0x1 option in snap_rpc.py nvme_controller_create.

Keywords: NVMe; Windows

Reported in version: 4.1.0

-

Description: VMWare ESXi supports "online" (post driver probe) namespace additions/removal, only if “Namespace Management” is supported by controller.

Workaround: Use --quirks=0x8 option in snap_rpc.py nvme_controller_create.

Keywords: NVMe, ESXi

Reported in version: 4.1.0

-

Description: Ubuntu 22.04 does not support 500 VFs.

Workaround: N/A

Keywords: Virtio-blk; kernel driver; Ubuntu 22.04

Reported in version: 4.1.0

Description: Virtio-blk Linux kernel driver does not handle PCIe FLR events.

Workaround: N/A

Keywords: Virtio-blk; kernel driver

Reported in version: 4.0.0

3679373

Description: Virtio-blk spdk driver (vfio-pci based) does not handle PCIe FLR events.

Workaround: N/A

Keywords: Virtio-blk; SPDK driver

Reported in version: 4.3.0

Description: A n ew virtio-blk Linux kernel driver (starting kernel 4.18) does not support hot-unplug during traffic. Since the kernel may self-generate spontaneous IOs, on rare occasions, an issue may happen even when no traffic is explicitly being run.

Workaround: N/A

Keywords: Virtio-blk; kernel driver

Reported in version: 4.0.0

Description: SPDK NVMf/RDMA initiator fails to connect to kernel NVMf/RDMA remote target.

Workaround: Use setting spdk_rpc.py bdev_nvme_set_options --io-queue-requests=128 on SPDK configuration

Keywords: SPDK, NVMf, RDMA, kernel

Reported in version: 4.3.1

© Copyright 2023, NVIDIA. Last updated on Feb 8, 2024.