XLIO Library Architecture
XLIO is a user-space Linux library that accelerates network applications. For applications using the POSIX socket API, XLIO requires no code changes or recompilation—the library is simply loaded via the LD_PRELOAD environment variable. Alternatively, applications can dynamically load XLIO directly without using LD_PRELOAD, which requires minimal application modifications.
When applications transmit or receive TCP/UDP traffic (unicast or multicast, IPv4/IPv6), XLIO intercepts socket API calls and implements the underlying operations in user space. This enables packets to flow directly between the application and NVIDIA Ethernet adapters, bypassing the kernel and its TCP/UDP network stack.
POSIX Socket API
Zero-cost integration via LD_PRELOAD
Intercepts sockets send and receive calls to stream and datagram socket families
Processes network operations in user space instead of passing to the kernel
Enables direct data path between application and hardware
Ultra Socket API
Maximum performance with advanced features
True zero-copy data transfers
Requires application integration
See XLIO Ultra API for more information.
XLIO behaves like a standard networking stack, serving multiple network interfaces. The library determines routing based on how applications call bind(), connect(), and setsockopt(), combined with administrator-configured route lookups. When data flows through a supported NVIDIA Ethernet adapter, XLIO intercepts and accelerates the traffic.
For unsupported adapters, XLIO seamlessly passes calls to the standard kernel network stack, allowing hybrid environments where applications can use multiple adapters without configuration changes.
IPv4/6
TCP
UDP
XLIO offers two execution modes to accommodate different application requirements. See XLIO Configuration Parameters for configuration details.
Run to Completion (R2C) Mode
How it works:
Network operations execute directly in application thread context
XLIO performs all necessary work within socket API calls
Can execute hardware polling and progress other sockets as needed
Internal thread provides additional context for network operations (configurable)
Available for both POSIX and Ultra APIs
Performance characteristics:
Highest performance
Highest CPU efficiency (Cycles and Caching)
Lowest latency
Application requirements:
Strict threading model - Avoid sharing sockets between threads
Predictable socket distribution between application threads
Separate listen socket per thread is required for efficiency
Dedicated epoll per thread is required for efficiency
Frequent API calls - To provide execution context to XLIO
Best for:
Performance is critical
Application behavior is predictable
Single-threaded or well-controlled multi-threading without socket sharing
Network-optimized application
Worker Threads Mode
Provides greater flexibility and ease of use compared to the traditional Run to Completion (R2C) mode, making XLIO more accessible to applications that weren't specifically designed with high-performance networking in mind.
See section Worker Threads Mode for more information.
How it works:
XLIO spawns dedicated worker threads that handle all network operations
Application threads communicate with worker threads via a job queue system
Network operations decoupled from application execution context
Sockets assigned to specific worker threads based on policy
Memory copies performed in application thread context
Available for POSIX API only
Performance characteristics:
Out-of-the-box performance acceleration
Moderate CPU efficiency due to synchronization overhead (application and worker threads)
Moderate latency due to job queuing
Application benefits:
Thread flexibility - sockets can be safely shared between application threads
Out-of-the-box performance acceleration without fine tuning and application strict threading model requirements
Single epoll context is efficient
Single listen socket support
Minimal network awareness
No requirement for frequent socket API calls
Reduced buffer consumption compared to R2C mode
Best for:
Easy integration is priority
Application shares sockets between multiple threads
Application behavior that doesn't fit XLIO concurrency model
Heavy networking is a bottleneck in a single-threaded application
Application uses single listen socket to distribute incoming sockets to other application threads
Limitations:
Busy polling only
TCP only
Non-blocking sockets only
Support EPOLL IOMUX only.
XLIO maintains an internal thread for general operations and tasks. The thread's role varies by execution mode. Configuration options for the internal thread are available in the XLIO Tuning.
Run to Completion (R2C) Mode
Handle neighbor discovery (ARP/ICMPv6)
Poll completion queues to prevent packet drops
Manage TCP housekeeping (acknowledgments, retransmissions)
Handle final TCP socket closing
Adapt completion queue moderation (if enabled)
Synchronize network adapter clock with system clock
Handle bond management
Worker Threads Mode
Handle Neigh discovery progress (ARP/ICMPv6)
Adapt completion queue moderation (if enabled)
Synchronize network adapter clock with system clock
Handle bond management
Datagram sockets, also known as connectionless sockets, use User Datagram Protocol (UDP)
Stream sockets, also known as connection-oriented sockets, use Transmission Control Protocol (TCP)