SNAP-4 Service Appendixes
Before configuring SNAP, the user must ensure that all firmware configuration requirements are met. By default, SNAP is disabled and must be enabled by running both common SNAP configurations and additional protocol-specific configurations depending on the expected usage of the application (e.g., hot-plug, SR-IOV, UEFI boot, etc).
After configuration is finished, the host must be power cycled for the changes to take effect.
To verify that all configuration requirements are satisfied, users may query the current/next configuration by running the following:
mlxconfig -d /dev/mst/mt41692_pciconf0 -e query
System Configuration Parameters
Parameter | Description | Possible Values |
| Enable BlueField to work in internal CPU model Note
Must be set to | 0/1 |
| Enable SR-IOV | 0/1 |
| Enable PCIe switch for emulated PFs | 0/1 |
| The maximum number of hotplug emulated PFs which equals Note
One switch port is reserved for all static PFs. | [0,2-32] |
SRIOV_EN
is valid only for static PFs.
RDMA/RoCE Configuration
By default, BlueField's RDMA/RoCE communication is blocked for its primary OS interfaces (known as ECPFs, typically mlx5_0
and mlx5_1
).
If RoCE traffic is required, you must create additional network functions (scalable functions) that support RDMA/RoCE.
This configuration is not required when working over TCP or RDMA/IB.
To enable RoCE interfaces, run the following from within the DPU:
[dpu] mlxconfig -d /dev/mst/mt41692_pciconf0 s PER_PF_NUM_SF=1
[dpu] mlxconfig -d /dev/mst/mt41692_pciconf0 s PF_SF_BAR_SIZE=8 PF_TOTAL_SF=2
[dpu] mlxconfig -d /dev/mst/mt41692_pciconf0.1 s PF_SF_BAR_SIZE=8 PF_TOTAL_SF=2
NVMe Configuration
Parameter | Description | Possible Values |
| Enable NVMe device emulation | 0/1 |
| Number of static emulated NVMe PFs | [0-2] |
| Number of MSIX assigned to emulated NVMe PFs Note
The firmware treats this value as a best effort value. The effective number of MSI-X given to the function should be queried as part of the nvme_controller_list RPC command. | [0-63] |
| Number of MSIX per emulated NVMe VF Note
The firmware treats this value as a best effort value. The effective number of MSI-X given to the function should be queried as part of the nvme_controller_list RPC command.
Note
This value should match the maximum number of queues assigned to a VF's NVMe SNAP controller through the nvme_controller_create | [0-4095] |
| Number of VFs per emulated NVMe PF Note
If not 0, overrides | [0-256] |
| Enable NVMe UEFI exprom driver Note
Used for UEFI boot process. | 0/1 |
| Defines the default maximum queue depth for NVMe I/O queues. The value should be set to the binary logarithm of the desired maximum queue size.
| [0-12] |
Virtio-blk Configuration
Due to virtio-blk protocol limitations, using bad configuration while working with static virtio-blk PFs may cause the host server OS to fail on boot.
Before continuing, make sure you have configured:
A working channel to access Arm even when the host is shut down. Setting such channel is out of the scope of this document. Please refer to NVIDIA BlueField DPU BSP documentation for more details.
Add the following line to
/etc/nvda_snap/snap_rpc_init.conf
:virtio_blk_controller_create –pf_id 0
For more information, please refer to section "Virtio-blk Emulation Management".
Parameter | Description | Possible Values |
| Enable virtio-blk device emulation | 0/1 |
| Number of static emulated virtio-blk PFs Note
See warning above. | [0-4] |
| Number of MSIX assigned to emulated virtio-blk PFs Note
The firmware treats this value as a best effort value. The effective number of MSI-X given to the function should be queried as part of the virtio_blk_controller_list RPC command. | [0-63] |
| Number of MSIX per emulated virtio-blk VF Note
The firmware treats this value as a best effort value. The effective number of MSI-X given to the function should be queried as part of the virtio_blk_controller_list RPC command.
Note
This value should match the maximum number of queues assigned to a VF's NVMe SNAP controller through the nvme_controller_create | [0-4095] |
| Number of VFs per emulated virtio-blk PF Note
If not 0, overrides | [0-2000] |
| Enable virtio-blk UEFI exprom driver Note
Used for UEFI boot process. | 0/1 |
To configure persistent network interfaces so they are not lost after reboot. Under /etc/sysconfig/network-scripts
modify the following four files, or create them if do not exist, then perform a reboot:
# cd /etc/sysconfig/network-scripts/
# cat ifcfg-p0
NAME="p0"
DEVICE="p0"
NM_CONTROLLED="no"
DEVTIMEOUT=30
PEERDNS="no"
ONBOOT="yes"
BOOTPROTO="none"
TYPE=Ethernet
MTU=9000
# cat ifcfg-p1
NAME="p1"
DEVICE="p1"
NM_CONTROLLED="no"
DEVTIMEOUT=30
PEERDNS="no"
ONBOOT="yes"
BOOTPROTO="none"
TYPE=Ethernet
MTU=9000
# cat ifcfg-enp3s0f0s0
NAME="enp3s0f0s0"
DEVICE="enp3s0f0s0"
NM_CONTROLLED="no"
DEVTIMEOUT=30
PEERDNS="no"
ONBOOT="yes"
BOOTPROTO="static"
TYPE=Ethernet
IPADDR=1.1.1.1
PREFIX=24
MTU=9000
# cat ifcfg-enp3s0f1s0
NAME="enp3s0f1s0"
DEVICE="enp3s0f1s0"
NM_CONTROLLED="no"
DEVTIMEOUT=30
PEERDNS="no"
ONBOOT="yes"
BOOTPROTO="static"
TYPE=Ethernet
IPADDR=1.1.1.2
PREFIX=24
MTU=9000
The SNAP source package contains the files necessary for building a container with a custom SPDK.
To build the container:
Download and install the SNAP sources package:
[dpu] # dpkg -i /path/snap-sources_<version>_arm64.deb
Navigate to the
src
folder and use it as the development environment:[dpu] # cd /opt/nvidia/nvda_snap/src
Copy the following to the container folder:
SNAP source package – required for installing SNAP inside the container
Custom SPDK – to
container/spdk
. For example:[dpu] # cp /path/snap-sources_<version>_arm64.deb container/ [dpu] # git clone -b v23.01.1 --single-branch --depth 1 --recursive --shallow-submodules https://github.com/spdk/spdk.git container/spdk
Modify the
spdk.sh
file if necessary as it is used to compile SDPK.To build the container:
For Ubuntu, run:
[dpu] # ./container/build_public.sh --snap-pkg-file=snap-sources_<version>_arm64.deb
For CentOS, run:
[dpu] # rpm -i snap-sources-<version>.el8.aarch64.rpm [dpu] # cd /opt/nvidia/nvda_snap/src/ [dpu] # cp /path/snap-sources_<version>_arm64.deb container/ [dpu] # git clone -b v23.01.1 --single-branch --depth 1 --recursive --shallow-submodules https://github.com/spdk/spdk.git container/spdk [dpu] # yum install docker-ce docker-ce-cli [dpu] # ./container/build_public.sh --snap-pkg-file=snap-sources_<version>_arm64.deb
Transfer the created image from the Docker tool to the crictl tool. Run:
[dpu] # docker save doca_snap:<version> doca_snap.tar [dpu] # ctr -n=k8s.io images import doca_snap.tar
NoteTo transfer the container image to other setups, refer to appendix "Deploying Container on Setups Without Internet Connectivity".
To verify the image, run:
[DPU] # crictl images IMAGE TAG IMAGE ID SIZE docker.io/library/doca_snap <version> 79c503f0a2bd7 284MB
Edit the image filed in the
container/doca_snap.yaml
file. Run:image: doca_snap:<version>
Use the YAML file to deploy the container. Run:
[dpu] # cp doca_snap.yaml /etc/kubelet.d/
NoteThe container deployment preparation steps are required.
When Internet connectivity is not available on a DPU, Kubelet scans for the container image locally upon detecting the SNAP YAML. Users can load the container image manually before the deployment.
To accomplish this, users must download the necessary resources using a DPU with Internet connectivity and subsequently transfer and load them onto DPUs that lack Internet connectivity.
To download the
.yaml
file:[bf] # wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/doca/doca_container_configs/versions/<path-to-yaml>/doca_snap.yaml
NoteAccess the latest download command on NGC. The
doca_snap:4.1.0-doca2.0.2
tag is used in this section as an example, and the latest tag is also available on NGC.To download SNAP container image:
[bf] # crictl pull nvcr.io/nvidia/doca/doca_snap:4.1.0-doca2.0.2
To verify that the SNAP container image exists:
[bf] # crictl images IMAGE TAG IMAGE ID SIZE nvcr.io/nvidia/doca/doca_snap 4.1.0-doca2.0.2 9d941b5994057 267MB k8s.gcr.io/pause 3.2 2a060e2e7101d 251kB
Notek8s.gcr.io/pause
image is required for the SNAP container.To save the images as a
.tar
file:[bf] # mkdir images [bf] # ctr -n=k8s.io image export images/snap_container_image.tar nvcr.io/nvidia/doca/doca_snap:4.1.0-doca2.0.2 [bf] # ctr -n=k8s.io image export images/pause_image.tar k8s.gcr.io/pause:3.2
Transfer the
.tar
files and run the following to load them into Kubelet:[bf] # sudo ctr --namespace k8s.io image import images/snap_container_image.tar [bf] # sudo ctr --namespace k8s.io image import images/pause_image.tar
Now, the image exists in the tool and is ready for deployment.
[bf] # crictl images IMAGE TAG IMAGE ID SIZE nvcr.io/nvidia/doca/doca_snap 4.1.0-doca2.0.2 9d941b5994057 267MB k8s.gcr.io/pause 3.2 2a060e2e7101d 251kB
To build SPDK-19.04 for SNAP integration:
Cherry-pick a critical fix for SPDK shared libraries installation (originally applied on upstream only since v19.07).
[spdk.git] git cherry-pick cb0c0509
Configure SPDK:
[spdk.git] git submodule update --init [spdk.git] ./configure --prefix=/opt/mellanox/spdk --disable-tests --without-crypto --without-fio --with-vhost --without-pmdk --without-rbd --with-rdma --with-shared --with-iscsi-initiator --without-vtune [spdk.git] sed -i -e 's/CONFIG_RTE_BUILD_SHARED_LIB=n/CONFIG_RTE_BUILD_SHARED_LIB=y/g' dpdk/build/.config
NoteThe flags
--prefix
,--with-rdma
, and--with-shared
are mandatory.Make SPDK (and DPDK libraries):
[spdk.git] make && make install [spdk.git] cp dpdk/build/lib/* /opt/mellanox/spdk/lib/ [spdk.git] cp dpdk/build/include/* /opt/mellanox/spdk/include/
PCIe BDF (Bus, Device, Function) is a unique identifier assigned to every PCIe device connected to a computer. By identifying each device with a unique BDF number, the computer's OS can manage the system's resources efficiently and effectively.
PCIe BDF values are determined by host OS and are hence subject to change between different runs, or even in a single run. Therefore, the BDF identifier is not the best fit for permanent configuration.
To overcome this problem, NVIDIA devices add an extension to PCIe attributes, called VUIDs. As opposed to BDF, VUID is persistent across runs which makes it useful as a PCIe function identifier.
PCI BDF and VUID can be extracted one out of the other, using lspci
command:
To extract VUID out of BDF:
[host] lspci -s <BDF> -vvv | grep -i VU | awk '{print $4}'
To extract BDF out of VUID:
[host] ./get_bdf.py <VUID> [host] cat ./get_bdf.py #!/usr/bin/python3 import subprocess import sys vuid = sys.argv[1] # Split the output into individual PCI function entries lspci_output = subprocess.check_output(['lspci']).decode().strip().split('\n') # Create an empty dictionary to store the results pci_functions = {} # Loop through each PCI function and extract the BDF and full info for line in lspci_output: bdf = line.split()[0] if vuid in subprocess.check_output(['lspci', '-s', bdf, '-vvv']).decode(): print(bdf) exit(0) print("Not Found")
This appendix explains how SNAP consumes memory and how to manage memory allocation.
The user must allocate the DPA hugepages memory according to the section "Step 1: Allocate Hugepages". It is possible to use use a portion of the DPU memory allocation in the SNAP container as described in section "Adjusting YAML Configuration". This configuration includes the following minimum and maximum values:
The minimum allocation which the SNAP container consumes:
resources: requests: memory:
"4Gi"
The maximum allocation that the SNAP container is allowed to consume:
resources: limits: hugepages-2Mi:
"4Gi"
Hugepage memory is used by the following:
SPDK
mem-size
global variable which controls the SPDK hugepages consumption (configurable in SPDK, 1GB by default)SNAP
SNAP_MEMPOOL_SIZE_MB
– used with non-ZC mode as IO buffers staging buffers on the Arm. By default, the SNAP mempool consumes 1G from the SPDKmem-size
hugepages allocation. SNAP mempool may be configured using theSNAP_MEMPOOL_SIZE_MB
global variable (minimum is 64 MB).NoteIf the value assigned is too low, with non-ZC, a performance degradation could be seen.
SNAP and SPDK internal usage – 1G should be used by default. This may be reduced depending on the overall scale (i.e., VFs/num queues/QD).
XLIO buffers – allocated only when NVMeTCP XLIO is enabled.
The following is the limit of the container memory allowed to be used by the SNAP container:
resources:
limits:
memory: "6Gi"
This includes the hugepages limit (in this example, additional 2G of non-hugepages memory).
The SNAP container also consumes DPU SHMEM memory when NVMe recovery is used (described in section "NVMe Recovery"). In addition, the following resources are used:
limits:
memory:
With Linux environment on host OS, additional kernel boot parameters may be required to support SNAP related features:
To use SR-IOV:
For Intel,
intel_iommu=on iommu=pt
must be addedFor AMD,
amd_iommu=on iommu=pt
must be added
To use PCIe hotplug,
pci=realloc
must be addedmodprobe.blacklist=virtio_blk,virtio_pci
for non-built-invirtio-blk
driver orvirtio-pci
driver
To view boot parameter values, run:
cat /proc/cmdline
It is recommended to use the following with virtio-blk:
[dpu] cat /proc/cmdline BOOT_IMAGE … pci=realloc modprobe.blacklist=virtio_blk,virtio_pci
To enable VFs (virtio_blk/NVMe):
echo 125 > /sys/bus/pci/devices/0000\:27\:00.4/sriov_numvfs
Intel Server Performance Optimizations
cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.15.0_mlnx root=UUID=91528e6a-b7d3-4e78-9d2e-9d5ad60e8273 ro crashkernel=auto resume=UUID=06ff0f35-0282-4812-894e-111ae8d76768 rhgb quiet iommu=pt intel_iommu=on pci=realloc modprobe.blacklist=virtio_blk,virtio_pci
AMD Server Performance Optimizations
cat /proc/cmdline
cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.15.0_mlnx root=UUID=91528e6a-b7d3-4e78-9d2e-9d5ad60e8273 ro crashkernel=auto resume=UUID=06ff0f35-0282-4812-894e-111ae8d76768 rhgb quiet iommu=pt amd_iommu=on pci=realloc modprobe.blacklist=virtio_blk,virtio_pci