RDMA Aware Networks Programming User Manual

Active Queue Pair Operations

A QP can be queried staring at the point it was created and once a queue pair is completely operational, you may query it, be notified of events and conduct send and receive operations on it. This section describes the operations available to perform these actions.

Template: int ibv_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask, struct ibv_qp_init_attr *init_attr)

Input Parameters:

qp struct ibv_qp from ibv_create_qp

attr_maskbitmask of items to query (see ibv_modify_qp)

Output Parameters:

attrstruct ibv_qp_attr to be filled in with requested attributes

init_attrstruct ibv_qp_init_attrto be filled in with initial attributes

Return Value: 0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description: ibv_query_qp retrieves the various attributes of a queue pair (QP) as previously set through ibv_create_qp and ibv_modify_qp. The user should allocate a struct ibv_qp_attr and a struct ibv_qp_init_attr and pass them to the command. These structs will be filled in upon successful return. The user is responsible to free these structs.

struct ibv_qp_init_attr is described in ibv_create_qp and struct ibv_qp_attr is described in ibv_- modify_qp.

Template: int ibv_query_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr)

Input Parameters:

srq The SRQ to query

srq_attr The attributes of the specified SRQ

Output Parameters:

srq_attr The struct ibv_srq_attr is returned with the attributes of the specified SRQ

Return Value: 0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description: ibv_query_srq returns the attributes list and current values of the specified SRQ. It returns the attributes through the pointer srq_attr which is an ibv_srq_attr struct described above under ibv_create_srq. If the value of srq_limit in srq_attr is 0, then the SRQ limit reached ('low water- mark') event is not or is no longer armed. No asynchronous events will be generated until the event is re-armed.

Template: int ibv_query_xrc_rcv_qp(struct ibv_xrc_domain *xrc_domain, uint32_t xrc_qp_num, struct ibv_qp_attr *attr, int attr_mask, struct ibv_qp_init_attr *init_attr)

Input Parameters:

xrc_domain The XRC domain associated with this QP xrc_qp_num The queue pair number to identify this QP

attr The ibv_qp_attr struct in which to return the attributes attr_mask A mask specifying the minimum list of attributes to retrieve init_attr The ibv_qp_init_attr struct to return the initial attributes

Output Parameters:

attr A pointer to the struct containing the QP attributes of interest

init_attr A pointer to the struct containing initial attributes

Return Value: 0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description: ibv_query_xrc_rcv_qp retrieves the attributes specified in attr_mask for the XRC receive QP with the number xrc_qp_num and domain xrc_domain. It returns them through the pointers attr and init_attr.
The attr_mask specifies a minimal list to retrieve. Some RDMA devices may return extra attributes not requested. Attributes are valid if they have been set using the ibv_modify_xrc_rcv_qp. The exact list of valid attributes depends on the QP state. Multiple ibv_query_xrc_rcv_qp calls may yield different returned values for these attributes: qp_state, path_mig_state, sq_draining, ah_attr (if automatic path migration (APM) is enabled).

Template: int ibv_post_recv(struct ibv_qp *qp, struct ibv_recv_wr *wr, struct ibv_recv_wr **bad_wr)

Input Parameters:

qp struct ibv_qp from ibv_create_qp

wr first work request (WR) containing receive buffers

Output Parameters:

bad_wrpointer to first rejected WR

Return Value: 0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description: ibv_post_recv posts a linked list of WRs to a queue pair's (QP) receive queue. At least one receive buffer should be posted to the receive queue to transition the QP to RTR. Receive buffers are consumed as the remote peer executes Send, Send with Immediate and RDMA Write with Immediate operations. Receive buffers are NOT used for other RDMA operations. Processing of the WR list is stopped on the first error and a pointer to the offending WR is returned in bad_wr.

struct ibv_recv_wr is defined as follows:

Copy
Copied!
            

struct ibv_recv_wr { uint64_t wr_id; struct ibv_recv_wr *next; struct ibv_sge *sg_list; int num_sge; };   wr_id user assigned work request ID next pointer to next WR, NULL if last one. sg_list scatter array for this WR num_sge number of entries in sg_list struct ibv_sge is defined as follows: struct ibv_sge { uint64_t addr; uint32_t length; uint32_t lkey; };   addr address of buffer length length of buffer lkey local key (lkey) of buffer from ibv_reg_mr

Template: int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr, struct ibv_send_wr **bad_wr)

Input Parameters:

qp struct ibv_qp from ibv_create_qp

wr first work request (WR)

Output Parameters:

bad_wrpointer to first rejected WR

Return Value: 0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description: ibv_post_send posts a linked list of WRs to a queue pair's (QP) send queue. This operation is used to initiate all communication, including RDMA operations. Processing of the WR list is stopped on the first error and a pointer to the offending WR is returned in bad_wr.
The user should not alter or destroy AHs associated with WRs until the request has been fully executed and a completion queue entry (CQE) has been retrieved from the corresponding completion queue (CQ) to avoid unexpected behaviour.
The buffers used by a WR can only be safely reused after the WR has been fully executed and a WCE has been retrieved from the corresponding CQ. However, if the IBV_SEND_INLINE flag was set, the buffer can be reused immediately after the call returns.

struct ibv_send_wr is defined as follows:

Copy
Copied!
            

struct ibv_send_wr { uint64_t wr_id; struct ibv_send_wr *next; struct ibv_sge *sg_list; int num_sge; enum ibv_wr_opcode opcode; enum ibv_send_flags send_flags; uint32_t imm_data;/* network byte order */ union { struct { uint64_t remote_addr; uint32_t rkey; } rdma; struct { uint64_t remote_addr; uint64_t compare_add; uint64_t swap; uint32_t rkey; } atomic; struct { struct ibv_ah *ah; uint32_t remote_qpn; uint32_t remote_qkey; } ud; } wr; uint32_t xrc_remote_srq_num; }; wr_id user assigned work request ID next pointer to next WR, NULL if last one. sg_list scatter/gather array for this WR num_sge number of entries in sg_list opcode IBV_WR_RDMA_WRITE IBV_WR_RDMA_WRITE_WITH_IMM IBV_WR_SEND IBV_WR_SEND_WITH_IMM IBV_WR_RDMA_READ IBV_WR_ATOMIC_CMP_AND_SWP IBV_WR_ATOMIC_FETCH_AND_ADD send_flags (optional) - this is a bitwise OR of the flags. See the details below. imm_data immediate data to send in network byte order remote_addr remote virtual address for RDMA/atomic operations rkey remote key (from ibv_reg_mr on remote) for RDMA/atomic operations compare_add compare value for compare and swap operation swap swap value ah address handle (AH) for datagram operations remote_qpn remote QP number for datagram operations remote_qkey Qkey for datagram operations xrc_remote_srq_num shared receive queue (SRQ) number for the destination extended reliable connection (XRC). Only used for XRC operations.   send flags: IBV_SEND_FENCE set fence indicator IBV_SEND_SIGNALED send completion event for this WR. Only meaningful for QPs that had the sq_sig_all set to 0 IBV_SEND_SEND_SOLICITED set solicited event indicator IBV_SEND_INLINE send data in sge_list as inline data. struct ibv_sge is defined in ibv_post_recv.

Template: int ibv_post_srq_recv(struct ibv_srq *srq, struct ibv_recv_wr *recv_wr, struct ibv_recv_wr **bad_recv_wr)

Input Parameters:

srq The SRQ to post the work request to

recv_wr A list of work requests to post on the receive queue

Output Parameters:

bad_recv_wrpointer to first rejected WR

Return Value: 0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description: ibv_post_srq_recv posts a list of work requests to the specified SRQ. It stops processing the WRs from this list at the first failure (which can be detected immediately while requests are being posted), and returns this failing WR through the bad_recv_wr parameter.
The buffers used by a WR can only be safely reused after WR the request is fully executed and a work completion has been retrieved from the corresponding completion queue (CQ).
If a WR is being posted to a UD QP, the Global Routing Header (GRH) of the incoming message will be placed in the first 40 bytes of the buffer(s) in the scatter list. If no GRH is present in the incoming message, then the first 40 bytes will be undefined. This means that in all cases for UD QPs, the actual data of the incoming message will start at an offset of 40 bytes into the buffer(s) in the scatter list.

Template: int ibv_req_notify_cq(struct ibv_cq *cq, int solicited_only)

Input Parameters:

cq struct ibv_cq from ibv_create_cq

solicited_only only notify if WR is flagged as solicited

Output Parameters: none

Return Value: 0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description: ibv_req_notify_cq arms the notification mechanism for the indicated completion queue (CQ). When a completion queue entry (CQE) is placed on the CQ, a completion event will be sent to the completion channel (CC) associated with the CQ. If there is already a CQE in that CQ, an event won't be generated for this event. If the solicited_only flag is set, then only CQEs for WRs that had the solicited flag set will trigger the notification.
The user should use the ibv_get_cq_event operation to receive the notification.
The notification mechanism will only be armed for one notification. Once a notification is sent, the mechanism must be re-armed with a new call to ibv_req_notify_cq.

Template: int ibv_get_cq_event(struct ibv_comp_channel *channel, struct ibv_cq **cq, void **cq_con- text)

Input Parameters:

channel struct ibv_comp_channel from ibv_create_comp_channel

Output Parameters:

cq pointer to completion queue (CQ) associated with event cq_context user supplied context set in ibv_create_cq

Return Value: 0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description:
ibv_get_cq_event waits for a notification to be sent on the indicated completion channel (CC). Note that this is a blocking operation. The user should allocate pointers to a struct ibv_cq and a void to be passed into the function. They will be filled in with the appropriate values upon return. It is the user's responsibility to free these pointers.
Each notification sent MUST be acknowledged with the ibv_ack_cq_events operation. Since the ibv_destroy_cq operation waits for all events to be acknowledged, it will hang if any events are not properly acknowledged.
Once a notification for a completion queue (CQ) is sent on a CC, that CQ is now "disarmed" and will not send any more notifications to the CC until it is rearmed again with a new call to the ibv_req_notify_cq operation.
This operation only informs the user that a CQ has completion queue entries (CQE) to be processed, it does not actually process the CQEs. The user should use the ibv_poll_cq operation to process the CQEs.

Template: void ibv_ack_cq_events(struct ibv_cq *cq, unsigned int nevents)

Input Parameters:

cq struct ibv_cq from ibv_create_cq

nevents number of events to acknowledge (1...n)

Output Parameters: None

Return Value: None

Description: ibv_ack_cq_events acknowledges events received from ibv_get_cq_event. Although each noti- fication received from ibv_get_cq_event counts as only one event, the user may acknowledge multiple events through a single call to ibv_ack_cq_events. The number of events to acknowledge is passed in nevents and should be at least 1. Since this operation takes a mutex, it is some- what expensive and acknowledging multiple events in one call may provide better performance.
See ibv_get_cq_event for additional details.

Template: int ibv_poll_cq(struct ibv_cq *cq, int num_entries, struct ibv_wc *wc)

Input Parameters:

cq struct ibv_cq from ibv_create_cq

num_entries maximum number of completion queue entries (CQE) to return

Output Parameters:

wc CQE array

Return Value: Number of CQEs in array wc or -1 on error
Description: ibv_poll_cq retrieves CQEs from a completion queue (CQ). The user should allocate an array of struct ibv_wc and pass it to the call in wc. The number of entries available in wc should be passed in num_entries. It is the user's responsibility to free this memory.
The number of CQEs actually retrieved is given as the return value. CQs must be polled regularly to prevent an overrun. In the event of an overrun, the CQ will be shut down and an async event IBV_EVENT_CQ_ERR will be sent.

struct ibv_wc is defined as follows:

Copy
Copied!
            

struct ibv_wc { uint64_t wr_id; enum ibv_wc_status status; enum ibv_wc_opcode opcode; uint32_t vendor_err; uint32_t byte_len; uint32_t imm_data;/* network byte order */ uint32_t qp_num; uint32_t src_qp; enum ibv_wc_flags wc_flags; uint16_t pkey_index; uint16_t slid; uint8_t sl; uint8_t dlid_path_bits; }; wr_id user specified work request id as given in ibv_post_send or ibv_post_recv status IBV_WC_SUCCESS IBV_WC_LOC_LEN_ERR IBV_WC_LOC_QP_OP_ERR IBV_WC_LOC_EEC_OP_ERR IBV_WC_LOC_PROT_ERR IBV_WC_WR_FLUSH_ERR IBV_WC_MW_BIND_ERR IBV_WC_BAD_RESP_ERR IBV_WC_LOC_ACCESS_ERR IBV_WC_REM_INV_REQ_ERR IBV_WC_REM_ACCESS_ERR IBV_WC_REM_OP_ERR IBV_WC_RETRY_EXC_ERR IBV_WC_RNR_RETRY_EXC_ERR IBV_WC_LOC_RDD_VIOL_ERR IBV_WC_REM_INV_RD_REQ_ERR IBV_WC_REM_ABORT_ERR IBV_WC_INV_EECN_ERR IBV_WC_INV_EEC_STATE_ERR IBV_WC_FATAL_ERR IBV_WC_RESP_TIMEOUT_ERR IBV_WC_GENERAL_ERR opcode IBV_WC_SEND, IBV_WC_RDMA_WRITE, IBV_WC_RDMA_READ, IBV_WC_COMP_SWAP, IBV_WC_FETCH_ADD, IBV_WC_BIND_MW, IBV_WC_RECV = 1 << 7, IBV_WC_RECV_RDMA_WITH_IMM vendor_err vendor specific error byte_len number of bytes transferred imm_data immediate data qp_num local queue pair (QP) number src_qp remote QP number wc_flags see below pkey_index index of pkey (valid only for GSI QPs) slid source local identifier (LID) sl service level (SL) dlid_path_bits destination LID path bits   flags: IBV_WC_GRH global route header (GRH) is present in UD packet IBV_WC_WITH_IMM immediate data value is valid

Template: int ibv_init_ah_from_wc(struct ibv_context *context, uint8_t port_num, struct ibv_wc *wc, struct ibv_grh *grh, struct ibv_ah_attr *ah_attr)

Input Parameters:

context struct ibv_context from ibv_open_device. This should be the device the completion queue entry (CQE) was received on.

port_numphysical port number (1..n) that CQE was received on wc received CQE from ibv_poll_cq

grh global route header (GRH) from packet (see description)

Output Parameters:

ah_attr address handle (AH) attributes

Return Value: 0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description: ibv_init_ah_from_wc initializes an AH with the necessary attributes to generate a response to a received datagram. The user should allocate a struct ibv_ah_attr and pass this in. If appropriate, the GRH from the received packet should be passed in as well. On UD connections the first 40 bytes of the received packet may contain a GRH. Whether or not this header is present is indicated by the IBV_WC_GRH flag of the CQE. If the GRH is not present on a packet on a UD con- nection, the first 40 bytes of a packet are undefined.
When the function ibv_init_ah_from_wc completes, the ah_attr will be filled in and the ah_attr may then be used in the ibv_create_ah function. The user is responsible for freeing ah_attr.
Alternatively, ibv_create_ah_from_wc may be used instead of this operation.

Template: struct ibv_ah {}ibv_create_ah_from_wc{*}(struct ibv_pd *pd, struct ibv_wc *wc, struct ibv_grh *grh, uint8_t port_num)

Input Parameters:

pd protection domain (PD) from ibv_alloc_pd

wc completion queue entry (CQE) from ibv_poll_cq

grh global route header (GRH) from packet

port_num physical port number (1..n) that CQE was received on

Output Parameters: none

Return Value: Created address handle (AH) on success or -1 on error

Description: ibv_create_ah_from_wc combines the operations ibv_init_ah_from_wc and ibv_create_ah. See the description of those operations for details.

Template: int ibv_attach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid)

Input Parameters:

qp QP to attach to the multicast group

gid The multicast group GID

lid The multicast group LID in host byte order

Output Parameters: none

Return Value: 0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description: ibv_attach_mcast attaches the specified QP, qp, to the multicast group whose multicast group GID is gid, and multicast LID is lid.
Only QPs of Transport Service Type IBV_QPT_UD may be attached to multicast groups.
In order to receive multicast messages, a join request for the multicast group must be sent to the subnet administrator (SA), so that the fabric's multicast routing is configured to deliver messages to the local port.
If a QP is attached to the same multicast group multiple times, the QP will still receive a single copy of a multicast message.

Template: int ibv_detach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid)

Input Parameters:

qp QP to attach to the multicast group

gid The multicast group GID

lid The multicast group LID in host byte order

Output Parameters: none

Return Value: 0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description: ibv_detach_mcast detaches the specified QP, qp, from the multicast group whose multicast group GID is gid, and multicast LID is lid.

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.