Platform Software and Configuration#
This section provides information about the Linux kernel reference source code required for GPUDirect RDMA on the Grace platform.
Reference Code#
The 6.14-nvidia and newer kernels contain the necessary patches for enabling GPUDirect RDMA between the NVIDIA Blackwell GPU and ConnectX-8 (CX8) using Data Direct technology. Install the patches from the standard Ubuntu 24.04 network repos:
$ apt update
$ apt install linux-nvidia-64k-6.14
You can access the specific Linux kernel source code for the NVIDIA 6.14 kernel from the following location:
For the comprehensive list of patches, refer to:
Software Stack Requirements#
The NVIDIA Blackwell platform combines the Blackwell GPU and the CX8 network interface card (NIC), connected by a PCIe Gen6 x16 link. This configuration enables direct peer-to-peer (P2P) PCIe communication between the GPU and NIC.
Typically, this kind of communication requires platform support for PCIe Address Translation Services (ATS). However, the NVIDIA Grace™ CPU does not support PCIe ATS.
Instead, the CX8 NIC includes a special DMA feature called Data Direct Interface, which enables GPUDirect data transfers. This feature is included as a separate PCIe function that’s located under a different PCIe tree than the main NIC physical function (NET-PF).
Note
Refer to the Blackwell NVL72 with CX8 Software and Firmware Release Notes for the software and firmware versions.
NVIDIA GPU Driver#
GPUDirect RDMA with Data Direct requires the following GPU drivers:
GB200 - NVIDIA r570 GPU driver or newer
GB300 - NVIDIA r580 GPU driver or newer
CUDA#
NVIDIA CUDA® Toolkit requires the following versions:
GB200 - 12.8 CUDA® Toolkit or newer
GB300 - 13.0 CUDA® Toolkit or newer
DOCA#
Install DOCA Host 3.2.0-125000 version or later with profile doca-ofed,
including firmware version 40.47.1026 or later. That includes the necessary host
drivers and tools related to the data direct feature.