NVIDIA BlueField-2 SNAP for NVMe and Virtio-blk v3.8.0
NVIDIA BlueField-2 SNAP for NVMe and Virtio-blk v3.8.0

Known Issues

The following are known limitations of this NVMe/virtio-blk SNAP software version.

Ref #

Issue

Description: NVMeTCP XLIO is currently not supported when running 64K page size kernels on the DPU Arm cores (as is the case for CentOS 8.x, Rocky 8.x, or openEuler 20.x).

Workaround: N/A

Keywords: 64K page size; NVMeTCP XLIO

Discovered in version: 3.6.0

Description: When running with virtio-blk and virtio-net protocols in parallel, performance may be negatively impacted.

Workaround: N/A

Keywords: Performance

Discovered in version: 3.7.2

2957317

Description: Due to an upstream kernel bug that exists in some Linux kernel distributions, the command emulation_device_detach times out, which causes any inflight traffic to hang.

Workaround: It is recommended to ensure that all inflight traffic on the device is stopped before performing a hotunplug.

Keywords: PCIe Hotplug

Discovered in version: 3.6.0

3046440

Description: NVMe full-offload mode does not work properly over the first generation of BlueField SoCs

Workaround: N/A

Keywords: NVMe full-offload mode

Discovered in version: 3.6.0

2879262

Description: Due to a kernel bug that exists in some Linux kernel distributions, configuring large number of virtio queues along with a small number of MSIX may lead the kernel to a soft lock-up (on top of causing significant performance degradation).

Workaround: It is recommended that to keep virtio-blk controller's --num_queues value in snap_rpc.py controller_virtio_blk_create is smaller than the value of VIRTIO_BLK_EMULATION_NUM_MSIX (which is configured through mlxconfig).

Keywords: Virtio-blk; kernel hang

Discovered in version: 3.6.0

Description: SPDK multipath is supported only with NVMe over RDMA (and not with NVMe over TCP).

Workaround: N/A

Keywords: SPDK; NVMe

Discovered in version: 3.6.0

3055119

Description: Windows driver does not work with Virtio-blk SNAP-Direct feature.

Workaround: To disable the feature when working with Windows OS, user must set VIRTIO_BLK_SNAP_ZCOPY=0 in /etc/default/mlnx_snap.

Keywords: Windows

Discovered in version: 3.5.0

Description: NVMe multipath features cannot be obtained when using SNAP in full-offload mode configuration

Workaround: N/A

Keywords: NVMe full-offload mode; multipath

Discovered in version: 3.4.0

Description: After each PCIe device hot-plug, a matching controller must be immediately opened. Specifically, hot-unplugging the device before a controller is created may cause the host kernel driver to malfunction on some Linux distributions.

Workaround: N/A

Keywords: Hot-plug; controller

Discovered in version: 3.3.0

Description: SR-IOV on hot-plugged PFs is not supported

Workaround: N/A

Keywords: PCIe Hotplug

Discovered in version: 3.2.0

Description: Any PCIe emulated device exposed to the host must have a matching controller opened on it in mlnx_snap service prior to loading its kernel driver. This includes virtio-net devices too.

Workaround: N/A

Keywords: VF; PF; virtio-net; kernel driver

Discovered in version: 3.1.0

Description: It is not possible to attach block devices using the same nsid to different NVMe controllers which are linked to the same NVMe subsystem. For example, the following commands will result with an error as both controllers are attached with NSID 1:

Copy
Copied!
            

snap_rpc.py controller_nvme_namespace_attach NvmeEmu2pf0 spdk Null0 1 snap_rpc.py controller_nvme_namespace_attach NvmeEmu2pf1 spdk Null1 1

Workaround: N/A

Keywords: Block device; controller

Discovered in version: 3.0.0

Description: mlnx_snap NVMe controller supports an admin queue with a maximum size of 1024 towards the host.

Workaround: N/A

Keywords: Admin queue; controller

Discovered in version: 3.0.0

Description: The DPU expansion ROM includes NVMe and virtio-blk UEFI drivers certified by NVIDIA, which should be used by the BIOS. Any other BIOS drivers are not guaranteed to work properly.

Workaround: N/A

Keywords: BIOS; certified drivers

Discovered in version: 3.0.0

Description: Legacy interrupts are not supported.

Workaround: N/A

Keywords: Block device; controller

Discovered in version: 3.0.0

Note

The following are not BlueField SNAP limitations.

Ref #

Issue

3543249

Description: When using hotplugged PCIe devices, after all devices are plugged, the host must be rebooted for Windows to detect all devices.

Workaround: N/A

Keywords: Hotplug

Discovered in version: 3.7.4

3521378

Description: For a successful emulation_device_detach RPC command, it is recommended to use directio=1 (O_DIRECT) with virtio-blk controller created on hot-plugged emulation.

Note

If directio=0 is used, the IO must be stopped manually. Otherwise, emulation_device_detach may fail.

Workaround: N/A

Keywords: Virtio-blk; RPC

Discovered in version: 3.7.4

2957317

Description: Setting virtio-blk emulation on bare metal will end with server crash.

Workaround: Set the seg_max flag of the virtio-blk controller to at least 16 (default is 1) using the following RPC:

Copy
Copied!
            

controller_virtio_blk_create … --seg_max 16

Keywords: Virtio-blk; bare-metal; seg_max

Discovered in version: 3.7.2

3056533

Description: When using NVMe driver in Windows, if I/O is not completed for more than 120 seconds, Windows starts ignoring the NVMe device and its disks disappear.

Workaround: N/A

Keywords: NVMe device disappears

Discovered in version: 3.6.1

N/A

Description: There is a Windows driver known issue that it may crash when attaching multiple namespaces simultaneously. Users must attach namespaces one-by-one, and verify each namespace is discovered by the OS before attaching a new one.

Workaround: N/A

Keywords: Attaching multiple namespaces simultaneously

Discovered in version: 3.4.0

N/A

Description: There is a known Windows NVMe driver bug which causes Windows initiators to crash if the NVMe driver is started and no target is up and ready. Therefore, if users work with Windows OS on top of the emulated NVMe device, they must make sure that mlnx_snap NVMe controller is connected to the remote target before running the driver on the host side.

Workaround: N/A

Keywords: Windows initiators crash

Discovered in version: 3.1.0

N/A

Description: There is a known Windows driver bug in which namespaces hotplug is not supported. On newer Windows builds, NVMe controller quirks must be set to 0x5. For more information, please see section "Controller Parameters".

Workaround: N/A

Keywords: Namespaces hotplug

Discovered in version: 3.1.0

Ref #

Issue

3066750

Description: Driver does not support PCIe function level reset (FLR). Running FLR during IO causes the IO (and kernel) to hang.

Workaround: N/A

Keywords: PCIe function; hang

Discovered in version: 3.6.1

2879262

Description: When working with a large number of virtqueues (≥ 64) over a single MSIX, the host kernel might experience soft lockup. Specifically, setting --num_queues to a high number, which is also higher than the configured --num_msix value, might cause this issue.

Workaround:

Keywords: Kernel; hang; virtqueues

Discovered in version: 3.6.1

2957317

Description: In Linux kernel version 5.4.0-91-generic and above, the command emulation_device_detach times out if I/O traffic is running.

Workaround: N/A

Keywords: Command time out

Discovered in version: 3.6.1

Ref #

Issue

3231721

Description: When using emulation_device_attach RPC to hot plug a virtio-blk transitional device, the capacity and block size attributes must be provided for this hot-plugged virtio-blk transitional device.

Workaround: Use the --bdev_type spdk and --bdev spdk_bdev options to provide a bdev to the hot-plugged virtio-blk transitional device when using emulation_device_attach RPC.

Keywords: Hot plugging virtio-blk transitional device

Discovered in version: 3.7.0

Description: L egacy/transitional drivers do not require syncing with the device upon driver initialization. Therefore, it is highly recommended that the SNAP controller is opened on the PCIe function before the driver becomes operational. If the driver becomes operational before the controller, controller configuration options would be very limited.

Workaround: N/A

Keywords: Legacy; SNAP controller; SNAP driver

Discovered in version: 3.7.0

Description: L egacy/transitional device support naturally includes backends with 512B block size. Using backends with any other block size (e.g., 4K) can only be achieved when SNAP controller is opened before driver is activated.

Workaround: N/A

Keywords: Legacy; backend block size

Discovered in version: 3.7.0

© Copyright 2024, NVIDIA. Last updated on May 21, 2024.