NVIDIA PeerDirect
NVIDIA PeerDirect™ enables direct memory access between InfiniBand (IB) Core and peer memory clients (e.g., GPU cards) through a dedicated API. This allows RDMA-based applications (over InfiniBand or RoCE) to leverage the computing power of peer devices and the RDMA interconnect simultaneously—without copying data between peer-to-peer (P2P) devices.
A common use case is GPUDirect RDMA, which utilizes PeerDirect for efficient GPU communication.
For a detailed description of the API, refer to docs/readme_and_user_manual/PEER_MEMORY_API.txt
in the MLNX_OFED installation package.
The PeerDirect Async subsystem allows hardware peer devices (such as GPUs or dedicated accelerator cards) to control the HCA (Host Channel Adapter) directly for critical-path operations, reducing CPU involvement. This is achieved through a set of verb calls and data structures that provide the application with an abstract representation of operation sequences meant to be executed by the peer device.
This feature is only supported on ConnectX-5 adapter cards and above.
In GPU systems that utilize relaxed memory ordering, an RSYNC callback is used to ensure memory consistency. This callback is registered and implemented via an external module provided by the system vendor. Once the module is loaded, it registers the callback with MLNX_OFED, which will then use it to maintain proper memory operation ordering.