NVIDIA Accelerated IO (XLIO) Documentation Rev 3.60

Introduction to XLIO

The NVIDIA® Accelerated IO (XLIO) software library accelerates TCP and UDP network applications by delivering high bandwidth, low latency, and reduced CPU utilization. Built as a user-space library with a kernel-bypass architecture, XLIO enables direct data transfer between application memory and network adapters for optimal performance.

XLIO provides two APIs to meet different application requirements:

  • POSIX Socket API – offers zero-cost integration with existing applications. No code changes are required to achieve substantial networking performance improvements; applications can be accelerated simply by preloading the XLIO library.

  • Ultra Socket API – designed for applications demanding maximum performance. While it requires additional integration effort, it enables advanced capabilities such as true zero-copy data transfers and a simplified, highly optimized data path.

XLIO seamlessly integrates with cryptography-enabled NVIDIA® ConnectX® network adapters and NVIDIA® BlueField® DPUs, providing hardware-offloaded Transport Layer Security (TLS) symmetric encryption and decryption. It also leverages hardware acceleration features such as Large Receive Offload (LRO), TCP Segmentation Offload (TSO), and Striding-RQ to further enhance performance for both POSIX and Ultra API users.

XLIO leverages direct hardware access and advanced polling techniques of NVIDIA network adapters to bypass the kernel network stack for socket API operations. This kernel-bypass architecture enables applications to achieve exceptional network performance including:

  • Minimized context switches and interrupts

    • Higher throughput and requests per second

    • Increased number of new connections per second

    • Improved CPU utilization

    • Lower latency

  • Efficient Data Transfer

    • POSIX Socket API - Single-copy architecture. XLIO requires only one copy to transfer accelerated packets (unicast or multicast) between hardware and application buffers

    • Ultra Socket API - True zero-copy for both transmit and receive. Data flows directly between application buffers and the network adapter without intermediate copies

Hardware Offloads

XLIO takes advantage of hardware offload capabilities to maximize performance:

  • TLS Offload: Hardware-offloaded encryption and decryption for both transmit and receive paths

  • TSO (TCP Segmentation Offload): Hardware segmentation of large transmit packets

  • LRO (Large Receive Offload): Hardware aggregation of received packets

  • TCP/UDP/IP HW Checksum for TX and RX paths

Standard Protocol Interoperability

XLIO uses standard TCP and UDP over IPv4/IPv6, ensuring full interoperability with any TCP/UDP/IP networking stack. Applications accelerated with XLIO can seamlessly communicate with any machine, regardless of operating system or location on the Ethernet network. This enables asymmetric acceleration scenarios where only one side uses XLIO - ideal for TCP servers, multicast publishers, or multicast consumers - while maintaining compatibility with all Ethernet peers.

Kernel-Bypass Architecture Benefits

  • Reduced System Overhead:

    • Eliminates context switches between user space and kernel space

    • Bypasses TCP/IP stack processing for unicast and multicast operations

    • Removes in-kernel buffer copies—data moves directly between application and hardware

    • Reduces hardware interrupts for packet transmission and reception

    • Significantly lowers overall CPU usage required to handle network traffic

  • Enhanced Throughput:

    • Processes significantly higher packet rates than kernel-based networking

    • Maximizes messages per second (MPS) rates

    • Avoids queue congestion problems common in standard TCP/IP applications

  • Improved Latency:

    • Minimizes message latency through direct hardware access

    • Reduces latency spikes and outliers by eliminating unpredictable kernel delays

Zero Application Changes

  • Supports legacy socket applications without code modification

  • Works transparently with existing applications through LD_PRELOAD

  • Maintains standard socket API compatibility

© Copyright 2025, NVIDIA. Last updated on Nov 26, 2025