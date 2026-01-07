Note Starting from VMA v8.5.x, VMA_POLL parameter is renamed to SocketXtreme.

The API introduced for this capability allows an application to remove the overhead of socket API from the receive flow data path, while keeping the well-known socket API for the control interface. Using such functionality the application has almost direct access to VMA’s HW ring object and it is possible to implement a design which does not call socket APIs such as select(), poll(), epoll_wait(), recv(), recvfrom(), recvmsg(), read(), or readv().

The structures and constants are defined as shown below.

VMA Specific Events

Copy Copied! typedef enum { VMA_SOCKETXTREME_PACKET = (1ULL << 32 ), VMA_SOCKETXTREME_NEW_CONNECTION_ACCEPTED = (1ULL << 33 ) } vma_socketxtreme_events_t;

Parameter Description VMA_SOCKETXTREME_PACKET New packet is available VMA_SOCKETXTREME_NEW_CONNECTION_ACCEPTED New connection is auto accepted by server

Copy Copied! struct vma_buff_t { struct vma_buff_t* next; void * payload; uint16_t len; };

Parameter Description next Next buffer (for last buffer next == NULL) payload Point to data len Data length

Copy Copied! struct vma_packet_desc_t { size_t num_bufs; uint16_t total_len; struct vma_buff_t* buff_lst; };

VMA Packet

Parameter Description total_len Total data length buff_lst List of packet's buffers len Data length

Copy Copied! struct vma_completion_t { struct vma_packet_desc_t packet; uint64_t events; uint64_t user_data; struct sockaddr_in src; int listen_fd; };

Parameter Description events Set of events user_data User provided data By default this field has FD of the socket

User is able to change the content using setsockopt() with level argument SOL_SOCKET and opname as SO_VMA_USER_DATA src Source address (in network byte order) set for VMA_SOCKETXTREME_PACKET and VMA_SOCKETXTREME_NEW_CONNECTION_ACCEPTED events listen_fd Connected socket's parent/listen socket fd number. Valid in case VMA_SOCKETXTREME_NEW_CONNECTION_ACCEPTED event is set.

Syntax:

Copy Copied! int (*socketxtreme_poll)( int fd, struct vma_completion_t* completions, unsigned int ncompletions, int flags);

Where

fd – file descriptor

completions – array of completion elements

ncompletions – number of elements in passed array

flags – flags to control behavior (set zero)

Return values: Returns the number of ready completions during success. A negative value is returned in case of failure.

Description: This function polls the `fd` for VMA completions and returns maximum `ncompletions` - ready completions via the `completions` array. The `fd` represents a ring file descriptor. VMA completions are indicated for incoming packets and/or for other events. If VMA_SOCKETXTREME_PACKET flag is enabled in the vma_completion_t.events field the completion points to the incoming packet descriptor that can be accessed via the vma_completion_t.packet field. Packet descriptor points to the VMA buffers that contain data scattered by HW, so the data is delivered to the application with zero copy. Notice: after the application is finished with the returned packets and their buffers it must free them using free_vma_packets()/free_vma_buff() functions. If VMA_SOCKETXTREME_PACKET flag is disabled vma_completion_t.packet field is reserved. In addition to packet arrival event (indicated by VMA_SOCKETXTREME_PACKET flag) VMA also reports VMA_SOCKETXTREME_NEW_CONNECTION_ACCEPTED event and standard epoll events via the vma_completion_t.events field. VMA_SOCKETXTREME_NEW_CONNECTION_ACCEPTED event is reported when new connection is accepted by the server. When working with socketxtreme_poll() new connections are accepted automatically and accept (listen_socket) must not be called. VMA_SOCKETXTREME_NEW_CONNECTION_ACCEPTED event is reported for the new connected/child socket (vma_completion_t.user_data refers to child socket) and EPOLLIN event is not generated for the listen socket. For events other than packet arrival and new connection acceptance vma_completion_t.events bitmask composed using standard epoll API events types. Notice: the same completion can report multiple events, for example VMA_SOCKETXTREME_PACKET flag can be enabled together with EPOLLOUT event, etc.

Syntax:

Copy Copied! int (*get_socket_rings_num)( int fd);

Where:

fd – file descriptor

Return values: Returns the number of rings during success. A negative value is returned in case of failure.

Description: Returns the number of rings that are associated with socket.

Syntax:

Copy Copied! int (*get_socket_rings_fds)( int fd, int *ring_fds, int ring_fds_sz);

Where:

fd – file descriptor

ring_fds – int array of ring fds

ring_fds_sz – size of the array

Return values: Returns the number populated array entries during success. A negative value is returned in case of failure.

Description: Returns FDs of the rings that are associated with the socket.

Syntax:

Copy Copied! int (*socketxtreme_free_vma_packets)(struct vma_packet_desc_t *packets, int num);

Where:

packets – packets to be freed

num – number of packets in passed array

Return values: Returns zero value during success. A negative value is returned in case failure.

Description: Frees packets received by socketxtreme_poll().

For each packet in the `packets` array this function updates the receive queue size and the advertised TCP window size, if needed, for the socket that received the packet and frees VMA buffer list that is associated with the packet. Notice: for each buffer in the buffer list VMA decreases buffer's ref count and only buffers with ref count zero are deallocated. An application can call socketxtreme_ref_vma_buf() to increase the buffer reference count in order to hold the buffer even after socketxtreme_free_vma_packets() has been called. Also, the application is responsible to free buffers that could not be deallocated during socketxtreme_free_vma_packets() due to non-zero reference count. This is done by calling the socketxtreme_free_vma_buff() function.

Syntax:

Copy Copied! int (*socketxtreme_free_vma_buff)(struct vma_buff_t *buff);

Return values: Returns the buffer's reference count after the change (zero value means that the buffer has been deallocated). A negative value is returned in case of failure.

Description: Decrement the reference counter of a buffer received by socketxtreme_poll(). This function decrements the buff reference count. When buff's reference count reaches zero, it is deallocated.

Syntax:

Copy Copied! int (*socketxtreme_ref_vma_buff)(struct vma_buff_t *buff);

Where:

buff – buffer to be managed

Return values: Returns buffer's reference count after the change. A negative value is returned in case of failure.

Description: Increment the reference counter of a buffer received by socketxtreme_poll(). This function increments the reference count of the buffer. This function should be used in order to hold the buffer even after a call to socketxtreme_free_vma_packets(). When the buffer is no longer required it should be freed via socketxtreme_free_vma_buff ().

Sockperf benchmark supports socketxtreme mode. Its source code can be used as a reference of socketxtreme API usage.

The following sample implements server side logic based on the API described above.

In this example, the application just waits for connection requests and accepts new connections.

Collapse Source Copy Copied! #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <errno.h> #include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> #include <mellanox/vma_extra.h> int main( int argc, char **argv) { int rc = 0 ; int fd = - 1 ; struct sockaddr_in addr; static struct vma_api_t *_vma_api = NULL; static int _vma_ring_fd = - 1 ; char *strdev = (argc > 1 ? argv[ 1 ] : NULL); char *straddr = (argc > 2 ? argv[ 2 ] : NULL); char *strport = (argc > 3 ? argv[ 3 ] : NULL); if (!strdev || !straddr || !strport) { printf( "Wrong options

" ); exit( 1 ); } printf( "Dev: %s

Address: %s

Port:%s

" , strdev, straddr, strport); _ _vma_api = vma_get_api(); if (_vma_api == NULL) { printf( "VMA Extra API not found

" ); exit( 1 ); } fd = socket(AF_INET, SOCK_STREAM, IPPROTO_IP); rc = setsockopt(fd, SOL_SOCKET, SO_BINDTODEVICE, ( void *)strdev, strlen(strdev)); if (rc < 0 ) { printf( "setsockopt() failed %d : %s

" , errno, strerror(errno)); exit( 1 ); } bzero(&addr, sizeof(addr)); addr.sin_family = AF_INET; addr.sin_addr.s_addr = inet_addr(straddr); addr.sin_port = htons(atoi(strport)); rc = bind(fd, (struct sockaddr *)&addr, sizeof(addr)); if (rc < 0 ) { fprintf(stderr, "bind() failed %d : %s

" , errno, strerror(errno)); exit( 1 ); } _ _vma_api->get_socket_rings_fds(fd, &_vma_ring_fd, 1 ); if (_vma_ring_fd == - 1 ){ printf( "Failed to return the ring fd

" ); exit( 1 ); } listen(fd, 5 ); printf( "Waiting on: fd=%d

" , fd); while ( 0 == rc) { struct vma_completion_t vma_comps; rc = _vma_api->socketxtreme_poll(_vma_ring_fd, &vma_comps, 1 , 0 ); if (rc > 0 ) { printf( "socketxtreme_poll: rc=%d event=0x%lx user_data=%ld

" , rc, vma_comps.events, vma_comps.user_data); if (vma_comps.events & VMA_SOCKETXTREME_NEW_CONNECTION_ACCEPTED) { printf( "Accepted connection: fd=%d

" , ( int )vma_comps.user_data); rc = 0 ; } } } close(fd); fprintf(stderr, "socket closed

" ); return 0 ;





No support for:

Multi-thread

User should keep in mind the differences in flow between the standard socket API and that based on the polling completions model.