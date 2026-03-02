In this section there is the list of DOCA GPUNetIO functions that can be used on the CPU only.

This enum lists all the possible memory types that can be allocated with GPUNetIO.

Copy Copied! enum doca_gpu_mem_type { DOCA_GPU_MEM_TYPE_GPU = 0, DOCA_GPU_MEM_TYPE_GPU_CPU = 1, DOCA_GPU_MEM_TYPE_CPU_GPU = 2, };

Note With regards to the syntax, the text string after the DOCA_GPU_MEM_TYPE_ prefix signifies <where-memory-resides>_<who-has-access> .

DOCA_GPU_MEM_TYPE_GPU – memory resides on the GPU and is accessible from the GPU only

DOCA_GPU_MEM_TYPE_GPU_CPU – memory resides on the GPU and is accessible also by the CPU

DOCA_GPU_MEM_TYPE_CPU_GPU – memory resides on the CPU and is accessible also by the GPU

Typical usage of the DOCA_GPU_MEM_TYPE_GPU_CPU memory type is to send a notification from the CPU to the GPU (e.g., a CUDA kernel periodically checking to see if the exit condition set by the CPU is met).

This is the first function a GPUNetIO application must invoke to create an handler on a GPU device. The function initializes a pointer to a structure in memory with type struct doca_gpu * .

Copy Copied! doca_error_t doca_gpu_create( const char *gpu_bus_id, struct doca_gpu **gpu_dev);

gpu_bus_id – <PCIe-bus>:<device>.<function> of the GPU device you want to use in your application

gpu_dev [out] – GPUNetIO handler to that GPU device

To get the PCIe address, users can use the commands lspci or nvidia-smi .

This CPU function allocates different flavors of memory.

Copy Copied! doca_error_t doca_gpu_mem_alloc( struct doca_gpu *gpu_dev, size_t size, size_t alignment, enum doca_gpu_mem_type mtype, void **memptr_gpu, void **memptr_cpu)

gpu_dev – GPUNetIO device handler

size – Size, in bytes, of the memory area to allocate

alignment – Memory address alignment to use. If 0, default one will be used

mtype – Type of memory to allocate

memptr_gpu [out] – GPU pointer to use to modify that memory from the GPU if memory is allocated on or is visible by the GPU

memptr_cpu [out] – CPU pointer to use to modify that memory from the CPU if memory is allocated on or is visible by the CPU. Can be NULL if memory is GPU-only

Warning Make sure to use the right pointer on the right device! If an application tries to access the memory using the memptr_gpu address from the CPU, a segmentation fault will result.





Creates a new instance of a DOCA GPUNetIO semaphore. A semaphore is composed by a list of items each having, by default, a status flag, number of packets, and the index of a doca_gpu_buf in a doca_gpu_buf_arr .

For example, a GPUNetIO semaphore can be used in applications where a CUDA kernel is responsible for receiving packets in a doca_gpu_buf_arr array associated with an Ethernet receive queue object, doca_gpu_eth_rxq (see section "doca_gpu_dev_eth_rxq_receive_*"), and dispatching packet info to a second CUDA kernel which processes them.

Another way to use a GPUNetIO semaphore is to exchange data across different entities like two CUDA kernels or a CUDA kernel and a CPU thread. The reason for this scenario may be that the CUDA kernel needs to provide the outcome of the packet processing to the CPU which would in turn compile a statistics report. Therefore, it is possible to associate a custom application-defined structure with each item in the semaphore. This way, the semaphore can be used as a message passing object.

Entities communicating through a semaphore must adopt a poll/update mechanism according to the following logic:

Update: Populate the next item of the semaphore (packets' info and/or custom application-defined info). Set status flag to READY.

Poll: Wait for the next item to have a status flag equal to READY . Read and process info. Set status flag to DONE .



Copy Copied! doca_error_t doca_gpu_semaphore_create( struct doca_gpu *gpu_dev, struct doca_gpu_semaphore **semaphore)

gpu_dev – GPUNetIO handler

semaphore [out] – GPUNetIO semaphore handler associated to the GPU device

This function defines the type of memory for the semaphore allocation.

Copy Copied! doca_error_t doca_gpu_semaphore_set_memory_type( struct doca_gpu_semaphore *semaphore, enum doca_gpu_mem_type mtype)

semaphore – GPUNetIO semaphore handler

mtype – Type of memory to allocate the custom info structure If the application must share packet info only across CUDA kernels, then DOCA_GPU_MEM_GPU is the suggested memory type. If the application must share info from a CUDA kernel to a CPU (e.g., to report statistics or output of the pipeline computation), then DOCA_GPU_MEM_CPU_GPU is the suggested memory type



This function defines the number of items in a semaphore.

Copy Copied! doca_error_t doca_gpu_semaphore_set_items_num( struct doca_gpu_semaphore *semaphore, uint32_t num_items)

semaphore – GPUNetIO semaphore handler

num_items – Number of items to allocate

This function associates an application-specific structure to semaphore items as explained under "doca_gpu_semaphore_create".

Copy Copied! doca_error_t doca_gpu_semaphore_set_custom_info( struct doca_gpu_semaphore *semaphore, uint32_t nbytes, enum doca_gpu_mem_type mtype)

semaphore – GPUNetIO semaphore handler

nbytes – Size of the custom info structure to associate

mtype – Type of memory to allocate the custom info structure If the application must share packet info only across CUDA kernels, then DOCA_GPU_MEM_GPU is the suggested memory type If the application must share info from a CUDA kernel to a CPU (e.g., to report statistics or output of the pipeline computation), then DOCA_GPU_MEM_CPU_GPU is the suggested memory type



From the CPU, query the status of a semaphore item. If the semaphore is allocated with DOCA_GPU_MEM_GPU , this function results in a segmentation fault.

Copy Copied! doca_error_t doca_gpu_semaphore_get_status( struct doca_gpu_semaphore *semaphore_cpu, uint32_t idx, enum doca_gpu_semaphore_status *status)

semaphore_cpu – GPUNetIO semaphore CPU handler

idx – Semaphore item index

status [out] – Output semaphore status

From the CPU, retrieve the address of the custom info structure associated to a semaphore item. If the semaphore or the custom info is allocated with DOCA_GPU_MEM_GPU this function results in a segmentation fault.

Copy Copied! doca_error_t doca_gpu_semaphore_get_custom_info_addr( struct doca_gpu_semaphore *semaphore_cpu, uint32_t idx, void **custom_info)

semaphore_cpu – GPUNetIO semaphore CPU handler

idx – Semaphore item index

custom_info [out] – Output semaphore custom info address

The doca_gpu_verbs_export_qp function creates a GPUNetIO handler from a DOCA RDMA Verbs QP object. It takes a DOCA RDMA Verbs QP as input and returns a DOCA GPUNetIO Verbs QP object ( struct doca_gpu_verbs_qp ) allocated on the CPU. To use this object in a CUDA kernel, the application must extract a GPU device handler ( struct doca_gpu_dev_verbs_qp ) using the doca_gpu_verbs_get_qp_dev function.

Copy Copied! doca_error_t doca_gpu_verbs_export_qp( struct doca_gpu *gpu_dev, struct doca_dev *dev, struct doca_verbs_qp *qp, enum doca_gpu_dev_verbs_nic_handler nic_handler, void *gpu_qp_umem_dev_ptr, struct doca_verbs_cq *cq_sq, struct doca_verbs_cq *cq_rq, struct doca_gpu_verbs_qp **qp_out);

gpu_dev : GPUNetIO device handler.

dev : DOCA device handler.

qp : DOCA RDMA Verbs QP handler.

nic_handler : Type of NIC handler.

gpu_qp_umem_dev_ptr : GPU memory pointer to UMEM.

cq_sq and cq_rq : CQs associated with the Send and Receive Queues in the QP. Note While either cq_sq or cq_rq can be NULL, they cannot both be NULL simultaneously.

qp_out : DOCA GPUNetIO Verbs QP handler in CPU memory.

Extracts a GPU device handler ( struct doca_gpu_dev_verbs_qp ) from a DOCA GPUNetIO Verbs QP object.

Copy Copied! doca_error_t doca_gpu_verbs_get_qp_dev( struct doca_gpu_verbs_qp *qp, struct doca_gpu_dev_verbs_qp **qp_gpu);

qp : DOCA GPUNetIO Verbs QP handler.

qp_gpu : DOCA GPUNetIO Verbs QP GPU device handler in GPU memory.

Unexports a previously exported DOCA GPUNetIO Verbs QP object ( struct doca_gpu_verbs_qp ).

Copy Copied! doca_error_t doca_gpu_verbs_export_qp(ruct doca_gpu *gpu_dev, struct doca_gpu_verbs_qp *qp);

gpu_dev : GPUNetIO device handler.

qp : DOCA GPUNetIO Verbs QP handler in CPU memory.

The doca_gpu_verbs_bridge_export_qp function creates a DOCA GPUNetIO Verbs QP object from application-defined parameters, acting as a bridge between IBVerbs/mlx5 objects and DOCA GPUNetIO. This allows applications to create required objects like QP, CQ, and UAR using IBVerbs and mlx5 commands and then pass the relevant information to this DOCA GPUNetIO function.

The function returns a DOCA GPUNetIO Verbs QP object ( struct doca_gpu_verbs_qp ) allocated on the CPU. To use this object in a CUDA kernel, the application must extract a GPU device handler ( struct doca_gpu_dev_verbs_qp ) using the doca_gpu_verbs_get_qp_dev function.

It is the application's responsibility to ensure that all passed parameters are correctly created and set.

Copy Copied! doca_error_t doca_gpu_verbs_bridge_export_qp( struct doca_gpu *gpu_dev, uint32_t sq_qpn, void *sq_wqe_addr, uint16_t sq_wqe_num, uint32_t *sq_dbrec, uint64_t *sq_db, size_t uar_size, uint32_t sq_cqn, void *sq_cqe_addr, uint32_t sq_cqe_num, uint32_t *sq_cq_dbrec, uint32_t rq_qpn, void *rq_wqe_addr, uint16_t rq_wqe_num, uint32_t *rq_dbrec, uint32_t rcv_wqe_size, uint32_t rq_cqn, void *rq_cqe_addr, uint32_t rq_cqe_num, uint32_t *rq_cq_dbrec, enum doca_gpu_dev_verbs_nic_handler nic_handler, struct doca_gpu_verbs_qp **qp_out);