Appendix – Frequently Asked Questions
Please refer to chapter "mlnx_snap Installation".
Please refer to section "VirtIO-blk Configuration".
Assumptions:
The remote target is configured with nqn "Test" and 1 namespace, and it exposes it through the 2 RDMA interfaces 1.1.1.1/24 and 2.2.2.1/24
The RDMA interfaces are 1.1.1.2/24 and 2.2.2.2/24
Non-offload mode configuration:
Create the SPDK BDEVS. Run:
spdk_rpc.py bdev_nvme_attach_controller -b Nvme0 -t rdma -a 1.1.1.1 -f ipv4 -s 4420 -n Test spdk_rpc.py bdev_nvme_attach_controller -b Nvme1 -t rdma -a 2.2.2.1 -f ipv4 -s 4420 -n Test
Create NVMe controller. Run:
snap_rpc.py controller_nvme_create mlx5_0 --subsys_id 0 -c /etc/mlnx_snap/mlnx_snap.json --rdma_device mlx5_2
Attach the namespace twice, one through each port. Run:
snap_rpc.py controller_nvme_namespace_attach -c NvmeEmu0pf0 spdk Nvme0n1 1 snap_rpc.py controller_nvme_namespace_attach -c NvmeEmu0pf0 spdk Nvme1n1 2
At this stage, you should see /dev/nvme0n1 and /dev/nvme0n2 on the host "nvme list", both of which are mapped to the same remote disk via 2 different ports.
Full-offload mode configuration:
Full-offload mode currently allows users to connect to multiple remote targets in parallel (but not to the same remote target through different paths).
Create 2 separate JSON full-offload configuration files (see section "Full Offload Mode"). Each describe a connection to remote target via different RDMA interface.
Configure 2 separate NVMe device entries to be exposed to the host either as hot-plugged PCIe functions (see section "Runtime Configuration"), or “static” ones (see section "Firmware Configuration").
Create 2 NVMe controllers, one per RDMA interface. Run:
snap_rpc.py subsystem_nvme_create Mellanox_NVMe_SNAP "Mellanox NVMe SNAP Controller" snap_rpc.py controller_nvme_create mlx5_0 --subsys_id 0 --pf_id 0 -c /etc/mlnx_snap/mlnx_snap_p0.json --rdma_device mlx5_2 snap_rpc.py subsystem_nvme_create Mellanox_NVMe_SNAP "Mellanox NVMe SNAP Controller" snap_rpc.py controller_nvme_create mlx5_0 --subsys_id 1 --pf_id 1 -c /etc/mlnx_snap/mlnx_snap_p1.json --rdma_device mlx5_3
NoteNVMe controllers may also share the same NVMe subsystem. In this case, users must make sure all namespaces in all remote targets have a distinct NSID.
At this stage, you should see /dev/nvme0n1 and /dev/nvme1n1 on the host "nvme list".
Please refer to section "Full Offload Configuration".
For more information on full offload, please refer to section "Full Offload Mode".
Please refer to section "Firmware Configuration".
MLNX SNAP is natively compiled against NVIDIA's internal branch of SPDK. It is possible to work with different SPDK versions, under the following conditions:
mlnx-snap sources must be recompiled against the new SPDK sources
The new SPDK version changes do not break any external SPDK APIs
Integration process:
Build SPDK (and DPDK) with shared libraries.
[spdk.git] ./configure --prefix=/opt/mellanox/spdk-custom --disable-tests --disable-unit-tests --without-crypto --without-fio --with-vhost --without-pmdk --without-rbd --with-rdma --with-shared --with-iscsi-initiator --without-vtune --without-isal [spdk.git] make && sudo make install [spdk.git] cp -r dpdk/build/lib/* /opt/mellanox/spdk-custom/lib/ [spdk.git] cp -r dpdk/build/include/* /opt/mellanox/spdk-custom/include/
NoteIt is also possible to install DPDK in that directory but copying suffices.
NoteOnly the flag with-shared is mandatory
Build SNAP against the new SPDK.
[mlnx-snap.src] ./configure --with-snap --with-spdk=/opt/mellanox/spdk-custom --without-gtest --prefix=/usr [mlnx-snap.src] make -j8 && sudo make install
Append additional custom libraries to the mlnx-snap application. Set LD_PRELOAD="/opt/mellanox/spdk/lib/libspdk_custom_library.so".
NoteAdditional SPDK/DPDK libraries required by libspdk_custom_library.so might also need to be attached to LD_PRELOAD.
NoteLD_PRELOAD setting can be added to /etc/default/mlnx_snap for persistent work with the mlnx_snap system service.
Run application.
NVMe protocol has an embedded support for backends (namespaces) attach/detach at runtime.
To change backend storage during runtime for NVMe, run:
snap_rpc.py controller_nvme_namespace_detach -c NvmeEmu0pf0 1
snap_rpc.py controller_nvme_namespace_attach -c NvmeEmu0pf0 spdk nvme0n1 1
VirtIO-blk does not have similar support in its protocol’s specification. Therefore, detaching while running IO results in error on any IO received between the request to detach and attach.
To change backend storage at runtime for virtio-blk, run:
snap_rpc.py controller_virtio_blk_bdev_detach VblkEmu0pf0
snap_rpc.py controller_virtio_blk_bdev_attach VblkEmu0pf0 spdk nvme0n1
After adding the option to work with a large number of controllers, resource considerations had to be considered. It was necessary to pay special attention to the MSIX resource, which is limited to ~1K across the whole BlueField-2 card. Therefore, new PCI functions are now opened with limited resources by default (specifically, MSIX is set to 2).
User may choose to assign more resources for a specific function, as detailed in the following:
Increase the number of MSIX allowed to be assigned to a function (power-cycle may be required for changes to take effect):
[dpu] mlxconfig -d /dev/mst/mt41686_pciconf0 s VIRTIO_BLK_EMULATION_NUM_MSIX=63
Hotplug virtio-blk PF with the increased value of MSIX.
[dpu] snap_rpc.py emulation_device_attach mlx5_0 virtio_blk --num_msix=63
Open the controller with increased number of queues (1 queue per MSIX, and leave another free MSIX for configuration interrupts):
[dpu] snap_rpc.py controller_virtio_blk_create mlx5_0 --pf_id 0 --bdev_type spdk --bdev Null0 --num_queues=62
For more information, please refer to section "Performance Optimization".