DOCA GPUNetIO components have a dependency on CUDA. These dependencies differ for the CPU-side shared library versus the GPU-side datapath components.

CPU Shared Library ( libdoca_gpunetio.so ) This library has a dependency on libcuda.so (CUDA Driver API). Because it does not use the CUDA Runtime API, it is not subject to potential versioning issues associated with the runtime.

GPU Datapath Components The data path functions are delivered as both header files and a static library, which have different requirements: Header-only APIs (GPUNetIO Ethernet, GPUNetIO Verbs): These are inlined functions. Since they are compiled with your application, they are flexible and can be used with any recent CUDA version (e.g., CUDA 12.x or 13.x). Static Library APIs (GPUNetIO DMA, CommCh, RDMA): This library is pre-built with CUDA 13.0. Therefore, any application using functions from this static library must be built with CUDA 13.0 or newer.



It is generally recommended to use CUDA 12.6 or newer wherever possible to take advantage of new features.

To decrease initial application startup latency, it is highly recommended to enable NVIDIA driver persistence mode:

nvidia-smi -pm 1





To enable direct CPU access to GPU memory without using CUDA APIs, DOCA requires the GDRCopy kernel module and library.

Install necessary packages: sudo apt install -y check kmod Clone the GDRCopy repository: git clone https: Build GDRCopy: cd /opt/mellanox/gdrcopy && make Load the GDRCopy kernel module: ./insmod.sh Check if the gdrdrv and nvidia-peermem modules are loaded: lsmod | egrep gdrdrv Example output: gdrdrv 24576 0 nvidia 55726080 4 nvidia_uvm,nvidia_peermem,gdrdrv,nvidia_modeset Export the GDRCopy library path: export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/mellanox/gdrcopy/src Ensure CUDA library paths are in the environment variables: export PATH= "/usr/local/cuda/bin:${PATH}" export LD_LIBRARY_PATH= "/usr/local/cuda/lib:/usr/local/cuda/lib64:${LD_LIBRARY_PATH}" export CPATH= "$(echo /usr/local/cuda/targets/{x86_64,sbsa}-linux/include | sed 's/ /:/'):${CPATH}"

Note GDRCopy is optional. If not installed, DOCA GPUNetIO cannot allocate memory using the DOCA_GPU_MEM_TYPE_GPU_CPU flag. If GDRCopy is not detected, DOCA GPUNetIO will log warning messages. If GDRCopy is not required for your application, you can safely ignore the related warning messages. To use GDRCopy, ensure its installation path is included in the LD_LIBRARY_PATH environment variable or specified using the GDRCOPY_PATH_L environment variable.





To enable the NIC to send and receive packets using GPU memory, a memory mapping mechanism must be used. DOCA supports two methods:

dmabuf (default method): The preferred, modern method for mapping GPU memory. nvidia-peermem (fallback method): A legacy method used if dmabuf is not available or fails.

This is the primary method for mapping GPU memory. The prerequisites for this approach are:

Linux Kernel version 6.2 or later

libibverbs version 1.14.44 or later

CUDA Toolkit: Version 12.5 or older: Must be installed with the -m=kernel-open flag (implying open-source NVIDIA driver mode). Version 12.6 or newer: Open kernel mode is enabled by default.



Note Please ensure your system has nvidia-open drivers installed. If it shows cuda-drivers instead, it means the NVIDIA driver is installed as close source version and then dmabuf can't be used.

This method is used if dmabuf is unavailable. It requires the nvidia-peermem kernel module, which is installed with the CUDA Toolkit, to be loaded:

sudo modprobe nvidia-peermem





The recommended implementation is to attempt to get a dmabuf file descriptor first. If that fails, the application should fall back to the nvidia-peermem method.

The following code snippet demonstrates how to use dmabuf for GPU memory mapping with DOCA mmap, including the fallback logic:

/* Get the dmabuf file -descriptor for the GPU memory buffer from CUDA */ result = doca_gpu_dmabuf_fd(gpu_dev, gpu_buffer_addr, gpu_buffer_size, &(dmabuf_fd)); if (result != DOCA_SUCCESS) { /* Fallback to nvidia-peermem legacy method if dmabuf fails */ doca_mmap_set_memrange(gpu_buffer_mmap, gpu_buffer_addr, gpu_buffer_size); } else { /* Create DOCA mmap using dmabuf */ doca_mmap_set_dmabuf_memrange(gpu_buffer_mmap, dmabuf_fd, gpu_buffer_addr, 0, gpu_buffer_size); }





A failure in doca_gpu_dmabuf_fd (the if block in the example) likely indicates that the NVIDIA driver is not in open-source mode.

When doca_mmap_start is subsequently called, DOCA will attempt to map the GPU memory. If dmabuf was not set, it will automatically fall back to the legacy nvidia-peermem method. In this case, the following warning message is logged:

[DOCA][WRN][linux_devx_adapter.cpp:374] devx adapter 0x5566a16018e0: Registration using dmabuf is not supported, falling back to legacy registration

Note If your application can rely on nvidia-peermem and does not strictly require dmabuf , this warning message can be safely ignored.





GPUNetIO Ethernet samples use DOCA mmap with dmabuf and nvidia-peermem as the fallback (following the logic in the code example above).

GPUNetIO Verbs samples show an alternative verbs-based method, using ibv_reg_dmabuf_mr (for dmabuf ) and ibv_reg_mr (as the fallback).

Every time a GPU buffer is mapped to the NIC (e.g., buffers associated with send or receive queues), a portion of the GPU BAR1 mapping space is used. Therefore, it is important to check that the BAR1 mapping is large enough to hold all the bytes the DOCA GPUNetIO application is trying to map. To verify the BAR1 mapping space of a GPU you can use nvidia-smi :

$ nvidia-smi -q ==============NVSMI LOG============== ..... Attached GPUs : 1 GPU 00000000:CA:00.0 Product Name : NVIDIA A100 80GB PCIe Product Architecture : Ampere Persistence Mode : Enabled ..... BAR1 Memory Usage Total : 131072 MiB Used : 1 MiB Free : 131071 MiB

By default, some GPUs (e.g. RTX models) may have a very small BAR1 size:

$ nvidia-smi -q | grep -i bar -A 3 BAR1 Memory Usage Total : 256 MiB Used : 6 MiB Free : 250 MiB

If the BAR1 size is not enough, DOCA GPUNetIO applications may exit with errors because DOCA mmap fails to map the GPU memory buffers to the NIC (e.g., Failed to start mmap DOCA Driver call failure ). To overcome this issue, the GPU BAR1 must be increased from the BIOS. The system should have "Resizable BAR" option enabled. For further information, refer to this NVIDIA forum post.

All DOCA GPUNetIO samples and applications using Ethernet rely on DOCA Flow. Therefore, they must be executed with sudo or root privileges.

However, Verbs, RDMA and DMA samples can be run without sudo privileges if a specific option is enabled in the NVIDIA driver: