Key Concepts

An SR defines how much data will be sent, from where, how and, with RDMA, to where. struct ibv_send_wr is used to implement SRs.

An RR defines buffers where data is to be received for non-RDMA operations. If no buffers are defined and a transmitter attempts a send operation or a RDMA Write with immediate, a receive not ready (RNR) error will be sent. struct ibv_recv_wr is used to implement RRs.

A Completion Queue is an object which contains the completed work requests which were posted to the Work Queues (WQ). Every completion says that a specific WR was completed (both successfully completed WRs and unsuccessfully completed WRs).
A Completion Queue is a mechanism to notify the application about information of ended Work Requests (status, opcode, size, source). CQs have n Completion Queue Entries (CQE). The number of CQEs is specified when the CQ is created. When a CQE is polled it is removed from the CQ. CQ is a FIFO of CQEs. CQ can service send queues, receive queues, or both. Work queues from multiple QPs can be associated with a single CQ. struct ibv_cq is used to implement a CQ.

Memory Registration is a mechanism that allows an application to describe a set of virtually contiguous memory locations or a set of physically contiguous memory locations to the network adapter as a virtually contiguous buffer using Virtual Addresses.
The registration process pins the memory pages (to prevent the pages from being swapped out and to keep physical <-> virtual mapping). During the registration, the OS checks the permissions of the registered block. The registration process writes the virtual to physical address table to the network adapter. When registering memory, permissions are set for the region. Permissions are local write, remote read, remote write, atomic, and bind. Every MR has a remote and a local key (r_key, l_key). Local keys are used by the local HCA to access local memory, such as during a receive operation. Remote keys are given to the remote HCA to allow a remote process access to system memory during RDMA operations. The same memory buffer can be registered several times (even with different access permissions) and every registration results in a different set of keys.
struct ibv_mr is used to implement memory registration.

An MW allows the application to have more flexible control over remote access to its memory. Memory Windows are intended for situations where the application:

  • wants to grant and revoke remote access rights to a registered Region in a dynamic fashion with less of a performance penalty than using deregistration/registration or reregistration.

  • wants to grant different remote access rights to different remote agents and/or grant those rights over different ranges within a registered Region.

The operation of associating an MW with an MR is called Binding. Different MWs can overlap the same MR (event with different access permissions).

An Address Vector is an object that describes the route from the local node to the remote node. In every UC/RC QP there is an address vector in the QP context. In UD QP the address vector should be defined in every post SR. struct ibv_ah is used to implement address vectors.

The GRH is used for routing between subnets. When using RoCE, the GRH is used for routing inside the subnet and therefore is a mandatory. The use of the GRH is mandatory in order for an application to support both IB and RoCE.
When global routing is used on UD QPs, there will be a GRH contained in the first 40 bytes of the receive buffer. This area is used to store global routing information, so an appropriate address vector can be generated to respond to the received packet. If GRH is used with UD, the RR should always have extra 40 bytes available for this GRH. struct ibv_grh is used to implement GRHs.

Object whose components can interact with only each other. These components can be AH, QP, MR, and SRQ. A protection domain is used to associate Queue Pairs with Memory Regions and Memory Windows, as a means for enabling and controlling network adapter access to Host System memory. PDs are also used to associate Unreliable Datagram queue pairs with Address Handles, as a means of controlling access to UD destinations. struct ibv_pd is used to implement protection domains.

The network adapter may send async events to inform the SW about events that occurred in the system.

There are two types of async events:

  • Affiliated events: events that occurred to personal objects (CQ, QP, SRQ). Those events will be sent to a specific process.

  • Unaffiliated events: events that occurred to global objects (network adapter, port error). Those events will be sent to all processes.

Data is being gathered/scattered using scatter gather elements, which include:

  • Address: address of the local data buffer that the data will be gathered from or scattered to. Size: the size of the data that will be read from / written to this address.

  • L_key: the local key of the MR that was registered to this buffer. struct ibv_sge implements scatter gather elements.

Polling the CQ for completion is getting the details about a WR (Send or Receive) that was posted. If we have completion with bad status in a WR, the rest of the completions will be all be bad (and the Work Queue will be moved to error state). Every WR that does not have a completion (that was polled) is still outstanding. Only after a WR has a completion, the send / receive buffer may be used / reused / freed. The completion status should always be checked. When a CQE is polled it is removed from the CQ. Polling is accomplished with the ibv_poll_cq operation.

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.