NVIDIA Accelerated IO (XLIO) Documentation Rev 3.60

XLIO Library Architecture

XLIO is a user-space Linux library that accelerates network applications. For applications using the POSIX socket API, XLIO requires no code changes or recompilation—the library is simply loaded via the LD_PRELOAD environment variable. Alternatively, applications can dynamically load XLIO directly without using LD_PRELOAD, which requires minimal application modifications.

When applications transmit or receive TCP/UDP traffic (unicast or multicast, IPv4/IPv6), XLIO intercepts socket API calls and implements the underlying operations in user space. This enables packets to flow directly between the application and NVIDIA Ethernet adapters, bypassing the kernel and its TCP/UDP network stack.

POSIX Socket API

  • Zero-cost integration via LD_PRELOAD

  • Intercepts sockets send and receive calls to stream and datagram socket families

  • Processes network operations in user space instead of passing to the kernel

  • Enables direct data path between application and hardware

Ultra Socket API

  • Maximum performance with advanced features

  • True zero-copy data transfers

  • Requires application integration

  • See XLIO Ultra API for more information.

XLIO behaves like a standard networking stack, serving multiple network interfaces. The library determines routing based on how applications call bind(), connect(), and setsockopt(), combined with administrator-configured route lookups. When data flows through a supported NVIDIA Ethernet adapter, XLIO intercepts and accelerates the traffic.

For unsupported adapters, XLIO seamlessly passes calls to the standard kernel network stack, allowing hybrid environments where applications can use multiple adapters without configuration changes.

  • IPv4/6

  • TCP

  • UDP

XLIO offers two execution modes to accommodate different application requirements. See XLIO Configuration Parameters for configuration details.

Run to Completion (R2C) Mode

How it works:

  • Network operations execute directly in application thread context

  • XLIO performs all necessary work within socket API calls

  • Can execute hardware polling and progress other sockets as needed

  • Internal thread provides additional context for network operations (configurable)

  • Available for both POSIX and Ultra APIs

Performance characteristics:

  • Highest performance

  • Highest CPU efficiency (Cycles and Caching)

  • Lowest latency

Application requirements:

  • Strict threading model - Avoid sharing sockets between threads

  • Predictable socket distribution between application threads

  • Separate listen socket per thread is required for efficiency

  • Dedicated epoll per thread is required for efficiency

  • Frequent API calls - To provide execution context to XLIO

Best for:

  • Performance is critical

  • Application behavior is predictable

  • Single-threaded or well-controlled multi-threading without socket sharing

  • Network-optimized application

Worker Threads Mode

Provides greater flexibility and ease of use compared to the traditional Run to Completion (R2C) mode, making XLIO more accessible to applications that weren't specifically designed with high-performance networking in mind.

See section Worker Threads Mode for more information.

How it works:

  • XLIO spawns dedicated worker threads that handle all network operations

  • Application threads communicate with worker threads via a job queue system

  • Network operations decoupled from application execution context

  • Sockets assigned to specific worker threads based on policy

  • Memory copies performed in application thread context

  • Available for POSIX API only

Performance characteristics:

  • Out-of-the-box performance acceleration

  • Moderate CPU efficiency due to synchronization overhead (application and worker threads)

  • Moderate latency due to job queuing

Application benefits:

  • Thread flexibility - sockets can be safely shared between application threads

  • Out-of-the-box performance acceleration without fine tuning and application strict threading model requirements

  • Single epoll context is efficient

  • Single listen socket support

  • Minimal network awareness

  • No requirement for frequent socket API calls

  • Reduced buffer consumption compared to R2C mode

Best for:

  • Easy integration is priority

  • Application shares sockets between multiple threads

  • Application behavior that doesn't fit XLIO concurrency model

  • Heavy networking is a bottleneck in a single-threaded application

  • Application uses single listen socket to distribute incoming sockets to other application threads

Limitations:

  • Busy polling only

  • TCP only

  • Non-blocking sockets only

  • Support EPOLL IOMUX only.

XLIO maintains an internal thread for general operations and tasks. The thread's role varies by execution mode. Configuration options for the internal thread are available in the XLIO Tuning.

Run to Completion (R2C) Mode

  • Handle neighbor discovery (ARP/ICMPv6)

  • Poll completion queues to prevent packet drops

  • Manage TCP housekeeping (acknowledgments, retransmissions)

  • Handle final TCP socket closing

  • Adapt completion queue moderation (if enabled)

  • Synchronize network adapter clock with system clock

  • Handle bond management

Worker Threads Mode

  • Handle Neigh discovery progress (ARP/ICMPv6)

  • Adapt completion queue moderation (if enabled)

  • Synchronize network adapter clock with system clock

  • Handle bond management

  • Datagram sockets, also known as connectionless sockets, use User Datagram Protocol (UDP)

  • Stream sockets, also known as connection-oriented sockets, use Transmission Control Protocol (TCP)

© Copyright 2025, NVIDIA. Last updated on Nov 26, 2025