SNAP-4 Service Release Notes
This page describes new features, known issues, and bug fixes for NVIDIA BlueField-3 SNAP software.
New features in NVIDIA SNAP 4.7.0:
Added support for the official Virtio specification for Live Migration (beta).
Added NVMe admin commands pass-thru.
Added NVMe queue core allocation.
Added support for NVMe JSON crash recovery file (beta).
Added ability for the emulation manager to manage multiple PCI links.
Bug fixes
Ref # | Issue |
4255353 |
Description:
Information about the deleted |
Keywords: Hotplug | |
Discovered in version: 4.7.0 | |
4357139 | Description: During a host restart, the DPU performs DMA I/O read/write operations, which can result in incorrect memory access. |
Keywords: DMA, host restart | |
Discovered in version: 4.7.0 |
SNAP Issues
Ref # | Issue |
3631346 |
Description: When using dynamic MSIX with NVMe protocol, the |
Workaround: Ignore the value and assume the | |
Keywords: NVMe; MSIX | |
Discovered in version: 4.4.1 | |
- |
Description: The SPDK |
Workaround: N/A | |
Keywords: NVMe | |
Discovered in version: 4.4.0 | |
3817040 |
Description: When running |
Workaround: Destroy and re-create the controller. | |
Keywords: NVMe | |
Discovered in version: 4.4.0 | |
3809646 | Description: When working with a new DPA provider, if an interrupt is sent to DPA immediately after a DMA operation, DPA may wake up before the DMA is fully written to the buffer, causing it to miss events. |
Workaround: Add a software-based periodic wake-up mechanism. | |
Keywords: NVMe | |
Discovered in version: 4.4.0 | |
3773346 |
Description: When configuring
the virtio-blk controller, using an unaligned |
Workaround: | |
Keywords: Virtio-blk; NVMe-oF; spdk | |
Discovered in version: 4.3.1 | |
3745842 | Description: When using the NVMe/TCP SPDK block device as a backend, SNAP is limited to working with no more than 8 cores. |
Workaround: Work with Arm core mask which uses only 8 cores. | |
Keywords: NVMe; TCP; SPDK | |
Discovered in version: 4.3.1 | |
- |
Description: Container images may become corrupted, resulting in a container status of |
Workaround:
Remove the YAML from kubelet, use | |
Keywords: NGC; container image | |
Discovered in version: 4.3.1 | |
3757171 | Description: When running virtio-blk emulation with large IOs (>128K) and SPDK's nvmf initiator as a backend, IOs might fail in the SPDK layer due to poor alignment. |
Workaround: | |
Keywords: SPDK, virtio-blk, size_max | |
Discovered in version: 4.3.1 | |
3689918 3753637 | Description: The SNAP container takes a long time to start up when configured with a large number of emulations, potentially exceeding the default NVMe driver timeout. |
Workaround: Increase the NVMe driver IO timeout from 30 to 300 seconds. | |
Keywords: NVMe; recovery; kernel driver | |
Discovered in version: 4.3.0 | |
- | Description: NVMeTCP XLIO is not supported when running 64K page size kernels on the DPU Arm cores (as is the case for CentOS 8.x, Rocky 8.x, or openEuler 20.x). |
Workaround: N/A | |
Keywords: 64K page size; NVMeTCP XLIO | |
Discovered in version: 4.1.0 | |
3264154 | Description: NVMeTCP XLIO is not supported when running 64K page size kernels on the DPU Arm cores (such is the case with CentOS 8.x, Rocky 8.x, or openEuler 20.x). |
Workaround: N/A | |
Keywords: Page size; NVMeTCP XLIO | |
Discovered in version: 4.1.0 | |
- | Description: NVMe over RDMA full offload is not supported. |
Workaround: N/A | |
Keywords: NVMe over RDMA; support | |
Discovered in version: 4.0.0 | |
4110943 | Description: Hot-unplugging a hotplugged Virtio BLK device is not allowed unless a Virtio BLK controller has previously been created for the device. |
Workaround: Create a Virtio BLK controller on the device which need to do hotunplug | |
Keywords: hotplug, hotunplug | |
Discovered in version: 4.5.0 | |
4104709 | Description: Some legacy operating systems (for example, RockyLinux with kernel 4.18) issues virtio-blk zero-length I/Os during boot (for example, during EDD probing). |
Workaround: Set VIRTIO_BLK_ZERO_LEN_IO_FAIL=1 environment variable to configure SNAP to fail zero-length I/Os in virtio-blk. | |
Keywords: virtio-blk, zero-length I/Os | |
Reported in version: 4.6.0 | |
- | Description: When using both NVMe and Virtio-blk protocols in SNAP, their data providers may share the same DPA HARTS, potentially causing NVMe configuration or performance issues. This is especially relevant when Virtio-blk is in DPU mode and NVMe is in DPA mode. |
Workaround: Use | |
Keywords: Virtio-blk and NVMe, DPA core mask. | |
Reported in version: 4.6.0 | |
- | Description: Linux kernel's |
Workaround: Use the | |
Keywords: virtio_blk, linux, kernel, size_max | |
Reported in version: 4.7.0 | |
4396707 | Description: SPDK's |
Workaround: When using the SPDK | |
Keywords: virtio_blk, size_max, SPDK | |
Reported in version: 4.7.0 | |
4409344 | Description: When performing live update too fast (using automated script), destination process might not yet create all necessary resources, when prompted to handshake with source process. |
Workaround: Add a | |
Keywords: virtio_blk, nvme, live update | |
Reported in version: 4.7.0 | |
4412341 | Description: When using high scale (512<=) of virtio-blk VFs on a single PF, sudden hypervisor crash (or brutal warm reboot) may result in hypervisor hang, due to the long FLR processing time. |
Workaround: Split the opened VFs among more PFs; gracefully shutdown VMs before performing hard OS reset | |
Keywords: virtio-blk, FLR, SRIOV | |
Reported in version: 4.7.0 |
OS or Vendor Issues
Ref # | Issue |
- | Description: Some old Windows OS NVMe drivers have buggy usage of SGL support. |
Workaround: Disable SGL support when using Windows OS by setting the | |
Keywords: Windows; NVMe | |
Reported in version: 4.4.0 | |
2879262 | Description: When the virtio-blk kernel driver cannot find enough MSI-X vectors to satisfy all its opened virtqueues, it failovers to assign a single MSI-X vector to all virtqueues which negatively impacts performance. In addition, when a large number (e.g., 64) of virtqueues are associated with a single MSI-X, the kernel may enter a soft-lockup (kernel bug) and the IO will hang. |
Workaround: Always keep | |
Keywords: Virtio-blk; kernel driver; MSI-X | |
Reported in version: 4.3.0 | |
- | Description: If PCIe devices are inserted before the hot-plug driver is loaded on the host, the hot-plug driver in kernel versions less than 4.19 does not enable the slot, even if the slot is occupied (i.e., presence detected in the slot status register). This means that only the presence state of the slot is updated by the firmware, but the PCIe slot is not enabled by the kernel after the host boots up. As a result, the PCIe device will not be visible when using |
Workaround: Add | |
Keywords: Virtio-blk; kernel driver; hot-plug | |
Reported in version: 4.2.1 | |
- | Description: RedHat/Centos 7.x does not handle "online" (post driver probe) namespace additions or removals correctly. |
Workaround: Use | |
Keywords: NVMe; CentOS; RedHat; kernel | |
Reported in version: 4.1.0 | |
- | Description: Some Windows drivers have experimental support for "online" (post driver probe) namespace additions/removal, although such support is not communicated with the device. |
Workaround: Use | |
Keywords: NVMe; Windows | |
Reported in version: 4.1.0 | |
- | Description: VMWare ESXi supports "online" (post driver probe) namespace additions/removal, only if “Namespace Management” is supported by controller. |
Workaround: Use | |
Keywords: NVMe, ESXi | |
Reported in version: 4.1.0 | |
- | Description: Ubuntu 22.04 does not support 500 VFs. |
Workaround: N/A | |
Keywords: Virtio-blk; kernel driver; Ubuntu 22.04 | |
Reported in version: 4.1.0 | |
- | Description: Virtio-blk Linux kernel driver does not handle PCIe FLR events. |
Workaround: N/A | |
Keywords: Virtio-blk; kernel driver | |
Reported in version: 4.0.0 | |
3679373 | Description: Virtio-blk spdk driver (vfio-pci based) does not handle PCIe FLR events. |
Workaround: N/A | |
Keywords: Virtio-blk; SPDK driver | |
Reported in version: 4.3.0 | |
- | Description: A n ew virtio-blk Linux kernel driver (starting kernel 4.18) does not support hot-unplug during traffic. Since the kernel may self-generate spontaneous IOs, on rare occasions, an issue may arise even when there is no traffic. |
Workaround: N/A | |
Keywords: Virtio-blk; kernel driver | |
Reported in version: 4.0.0 | |
Description: SPDK NVMf/RDMA initiator fails to connect to kernel NVMf/RDMA remote target. | |
Workaround: Use setting | |
Keywords: SPDK, NVMf, RDMA, kernel | |
Reported in version: 4.3.1 | |
- | Description: Windows OS virtio-blk driver expects at least 64K data to be available for a single IO request |
Workaround: Use | |
Keywords: Windows, virtio-blk | |
Reported in version: 4.3.1 | |
- | Description: Some older Windows OS versions have malfunctioning inbox virtio-blk driver, expects a 3-party virtio-blk driver to be pre-installed to operate properly. |
Workaround: Use a verified 3rd-party driver from Fedora | |
Keywords: Windows, virtio-blk | |
Reported in version: 4.3.1 | |
- | Description: When using hotplugged PCIe devices, after all devices are plugged, the host must be rebooted for Windows to detect all devices (some Windows versions may perform reboot automatically). This is requires as Windows OS does not support online PCIe rescan (as in Linux). |
Workaround: N/A | |
Keywords: Hotplug, Windows | |
Reported in version: 4.5.0 | |
3748674 | Description: On most modern Linux distributions, unplugging a PCIe function from the host while there are inflight I/Os can cause the virtio-blk driver to hang. |
Workaround: N/A | |
Keywords: Hotplug, Linux | |
Reported in version: 4.5.0 | |
4206444 |
Description: The Linux kernel driver does not restrict |
Workaround: Ensure that all virtio-blk controllers are configured with a | |
Keywords: Virtio-blk; kernel driver; | |
Reported in version: 4.6.0 | |
- |
Description: When using SRIOV with VFs sharing the same driver as their PF ( |
Workaround: N/A | |
Keywords: SRIOV; driver. | |
Discovered in version: 3.8.0-8 | |
- | Description: Windows OS assumes NVMe devices support at least 2 IO CQs (CQ ID 2 exists), even when the controller declares it only supports 1 IO queue. |
Workaround: Open NVMe controller with | |
Keywords: Windows, NVMe | |
Discovered in version: 4.7.0 | |
4418372 | Description: On Windows OS, hot-unplugging a virtio-blk PCIe function can cause unexpected behavior, and a host reboot might be necessary to recover. This is because the Windows OS does not support online PCIe rescan, unlike Linux. |
Workaround: Before unplugging a PCIe function, disable its storage controller in | |
Keywords: Windows, virtio-blk, hotplug | |
Discovered in version: 4.7.0 |