Platform Software and Configuration#

This section provides information about the Linux kernel reference source code required for GPUDirect RDMA on the Grace platform.

Reference Code#

The 6.14-nvidia and newer kernels contain the necessary patches for enabling GPUDirect RDMA between the NVIDIA Blackwell GPU and ConnectX-8 (CX8) using Data Direct technology. Install the patches from the standard Ubuntu 24.04 network repos:

$ apt update

$ apt install linux-nvidia-64k-6.14

You can access the specific Linux kernel source code for the NVIDIA 6.14 kernel from the following location:

For the comprehensive list of patches, refer to:

Software Stack Requirements#

The NVIDIA Blackwell platform combines the Blackwell GPU and the CX8 network interface card (NIC), connected by a PCIe Gen6 x16 link. This configuration enables direct peer-to-peer (P2P) PCIe communication between the GPU and NIC.

Typically, this kind of communication requires platform support for PCIe Address Translation Services (ATS). However, the NVIDIA Grace™ CPU does not support PCIe ATS.

Instead, the CX8 NIC includes a special DMA feature called Data Direct Interface, which enables GPUDirect data transfers. This feature is included as a separate PCIe function that’s located under a different PCIe tree than the main NIC physical function (NET-PF).

Note

Refer to the Blackwell NVL72 with CX8 Software and Firmware Release Notes for the software and firmware versions.

NVIDIA GPU Driver#

GPUDirect RDMA with Data Direct requires the following GPU drivers:

  • GB200 - NVIDIA r570 GPU driver or newer

  • GB300 - NVIDIA r580 GPU driver or newer

CUDA#

NVIDIA CUDA® Toolkit requires the following versions:

  • GB200 - 12.8 CUDA® Toolkit or newer

  • GB300 - 13.0 CUDA® Toolkit or newer

DOCA#

Install DOCA Host 3.2.0-125000 version or later with profile doca-ofed, including firmware version 40.47.1026 or later. That includes the necessary host drivers and tools related to the data direct feature.