Introduction to XLIO
The NVIDIA® Accelerated IO (XLIO) software library accelerates TCP and UDP network applications by delivering high bandwidth, low latency, and reduced CPU utilization. Built as a user-space library with a kernel-bypass architecture, XLIO enables direct data transfer between application memory and network adapters for optimal performance.
XLIO provides two APIs to meet different application requirements:
POSIX Socket API – offers zero-cost integration with existing applications. No code changes are required to achieve substantial networking performance improvements; applications can be accelerated simply by preloading the XLIO library.
Ultra Socket API – designed for applications demanding maximum performance. While it requires additional integration effort, it enables advanced capabilities such as true zero-copy data transfers and a simplified, highly optimized data path.
XLIO seamlessly integrates with cryptography-enabled NVIDIA® ConnectX® network adapters and NVIDIA® BlueField® DPUs, providing hardware-offloaded Transport Layer Security (TLS) symmetric encryption and decryption. It also leverages hardware acceleration features such as Large Receive Offload (LRO), TCP Segmentation Offload (TSO), and Striding-RQ to further enhance performance for both POSIX and Ultra API users.
XLIO leverages direct hardware access and advanced polling techniques of NVIDIA network adapters to bypass the kernel network stack for socket API operations. This kernel-bypass architecture enables applications to achieve exceptional network performance including:
Minimized context switches and interrupts
Higher throughput and requests per second
Increased number of new connections per second
Improved CPU utilization
Lower latency
Efficient Data Transfer
POSIX Socket API - Single-copy architecture. XLIO requires only one copy to transfer accelerated packets (unicast or multicast) between hardware and application buffers
Ultra Socket API - True zero-copy for both transmit and receive. Data flows directly between application buffers and the network adapter without intermediate copies
Hardware Offloads
XLIO takes advantage of hardware offload capabilities to maximize performance:
TLS Offload: Hardware-offloaded encryption and decryption for both transmit and receive paths
TSO (TCP Segmentation Offload): Hardware segmentation of large transmit packets
LRO (Large Receive Offload): Hardware aggregation of received packets
TCP/UDP/IP HW Checksum for TX and RX paths
Standard Protocol Interoperability
XLIO uses standard TCP and UDP over IPv4/IPv6, ensuring full interoperability with any TCP/UDP/IP networking stack. Applications accelerated with XLIO can seamlessly communicate with any machine, regardless of operating system or location on the Ethernet network. This enables asymmetric acceleration scenarios where only one side uses XLIO - ideal for TCP servers, multicast publishers, or multicast consumers - while maintaining compatibility with all Ethernet peers.
Kernel-Bypass Architecture Benefits
Reduced System Overhead:
Eliminates context switches between user space and kernel space
Bypasses TCP/IP stack processing for unicast and multicast operations
Removes in-kernel buffer copies—data moves directly between application and hardware
Reduces hardware interrupts for packet transmission and reception
Significantly lowers overall CPU usage required to handle network traffic
Enhanced Throughput:
Processes significantly higher packet rates than kernel-based networking
Maximizes messages per second (MPS) rates
Avoids queue congestion problems common in standard TCP/IP applications
Improved Latency:
Minimizes message latency through direct hardware access
Reduces latency spikes and outliers by eliminating unpredictable kernel delays
Zero Application Changes
Supports legacy socket applications without code modification
Works transparently with existing applications through LD_PRELOAD
Maintains standard socket API compatibility