NVIDIA Accelerated IO (XLIO) Documentation Rev 3.60

XLIO Environment Variables

XLIO configuration is performed using environment variables. For the full list of XLIO parameters, please see libxlio README file.

XLIO parameters must be set prior to loading the application with XLIO. You can set the parameters in a system file, which can be run manually or automatically.

All the parameters have defaults that can be modified.

On default startup, the XLIO library prints the XLIO version information, as well as the configuration parameters being used and their values to stderr.

XLIO always logs the values of the XLIO_TRACELEVEL parameter, even when it matches the default setting.

For all other parameters, XLIO logs the parameter values only when they are not equal to the default value.

The following table lists configuration parameters which can be used to tune XLIO performance for more specific use cases.

XLIO_WORKER_THREADS

Configures the execution model of XLIO. See XLIO Library Architecture for more information about execution models.

0 - Run to completion execution model (Default)

Positive Number - Worker Threads execution model - The value determines the number of XLIO worker threads.

XLIO_TSO

With Segmentation Offload, or TCP Large Send, TCP can pass a buffer to be

transmitted that is bigger than the maximum transmission unit (MTU) supportedby the medium. Intelligent adapters implement large sends by using theprototype TCP and IP headers of the incoming send buffer to carve out segmentsof required size. Copying the prototype header and options, then calculatingthe sequence number and checksum fields creates TCP segment headers.Expected benefits: Throughput increase and CPU unload.Default value: auto (Depends on ethtool setting and adapter ability.See ethtool -k <eth0> | grep tcp-segmentation-offload)

Set XLIO_TSO=1 to ensure Segmentation Offload is on for TX throughput oriented applications.

XLIO_LRO

Large receive offload (LRO) is a technique for increasing inbound throughput of

high-bandwidth network connections by reducing central processing unit (CPU)overhead. It works by aggregating multiple incoming packets from a single streaminto a larger buffer before they are passed higher up the networking stack,thus reducing the number of packets that must be processed.Default value: auto (Depends on ethtool setting and adapter ability.See ethtool -k <eth0> | grep large-receive-offload)

Set XLIO_LRO=1 to ensure LRO is turned on for RX throughput oriented applications.

XLIO_CQ_POLL_BATCH_MAX

Max size of the array while polling the RX CQs in XLIO.

Default value is 16

XLIO_GRO_STREAMS_MAX

Control the number of TCP streams to perform GRO (generic receive offload) simultaneously.

Disable GRO with a value of 0.A GRO session is flushed after each RX poll cycle. See XLIO_CQ_POLL_BATCH_MAX.Default value is 32

XLIO_SELECT_POLL

The duration in micro-seconds (usec) in which to poll the hardware on Rx path before

going to sleep (pending an interrupt blocking on OS select(), poll() or epoll_wait().The max polling duration will be limited by the timeout the user is using whencalling select(), poll() or epoll_wait().When select(), poll() or epoll_wait() path has successful receive poll hits(see performance monitoring) the latency is improved dramatically. This comeson account of CPU utilization.Value range is -1, 0 to 100,000,000Where value of -1 is used for infinite pollingWhere value of 0 is used for no polling (interrupt driven)Default value is 100000

XLIO_SELECT_POLL_OS_RATIO

This will enable polling of the OS file descriptors while user thread calls

select() or poll() and the XLIO is busy in the offloaded sockets polling loop.This will result in a single poll of the not-offloaded sockets everyXLIO_SELECT_POLL offloaded sockets (CQ) polls.When disabled, only offloaded sockets are polled.(See XLIO_SELECT_POLL for more info)Disable with 0Default value is 10

XLIO_TCP_QUICKACK

If set, disable delayed acknowledge ability.

This means that TCP responds after every packet.For more information on TCP_QUICKACK flag refer to TCP manual page.Valid Values are:Use value of 0 to disable.Use value of 1 for enable.Default value is Disabled.

XLIO_TCP_SEND_BUFFER_SIZE

TCP send buffer size.

Default value is 1MB.

XLIO_TX_BUF_SIZE

Size of Tx data buffer elements allocation.

Can not be less then MTU (Maximum Transfer Unit) and greater than 0xFF00.Default value is calculated basing on XLIO_MTU and XLIO_MSS.

XLIO_RX_POLL_ON_TX_TCP

This parameter enables/disables TCP RX polling during TCP TX operation for faster

TCP ACK reception.Default: 0 (Disabled)

XLIO_SKIP_POLL_IN_RX

Allow TCP socket to skip CQ polling in rx socket call.

0 - Disabled1 - Skip always2 - Skip only if this socket was added to epoll beforeDefault: 0 (Disabled)

XLIO_SPEC

XLIO predefined specification profiles.

latencyOptimized for use cases that are keen on latency.Example: XLIO_SPEC=latency

XLIO_MULTILOCK

Control locking type mechanism for some specific flows.

Note that usage of Mutex might increase latency.0 - Spin1 - MutexDefault: 0 (Spin)

XLIO_TCP_2T_RULES

Use only 2 tuple rules for TCP connections, instead of using 5 tuple rules.

This can help to overcome steering limitations for outgoing TCP connections.However, this option requires a unique local IP address per XLIO ring. Inthe default ring per thread configuration, this means that each thread mustbind its sockets to a thread local IP address.Default: 0 (Disabled)

XLIO_RX_CQ_WAIT_CTRL

In scenarios of high scale of non blocking sockets in event driven usage such as epoll/poll/select

turning on this parameter (XLIO_RX_CQ_WAIT_CTRL=1) avoids high CPU usage inside the Kernel whileprocessing thread wakeup.

XLIO_CQ_AIM_INTERRUPTS_RATE_PER_SEC

Desired interrupts rate per second for each ring (CQ).

The count and period parameters for CQ moderation will change automatically

to achieve the desired interrupt rate for the current traffic rate.

Default value is 10000

Note: Adjusting the settings of this parameter can address different CPU utilization issues - see CPU Utilization Tuning.

XLIO_TIMER_RESOLUTION_MSEC

Control XLIO internal thread wakeup timer resolution (in milliseconds).

Default value is 10

Note: Adjusting the settings of this parameter can address different CPU utilization issues - see CPU Utilization Tuning.

XLIO_TCP_TIMER_RESOLUTION_MSEC

Control internal TCP timer resolution (fast timer) in milliseconds.

Minimum value is the thread wakeup timer resolution configured in

performance.threading.internal_handler.timer_msec.

Default value is 100

Note: Adjusting the settings of this parameter can address different CPU utilization issues - see CPU Utilization Tuning.

The following table lists configuration parameters and their possible values for new XLIO Beta level features. The parameters below are disabled by default.

These XLIO features are still experimental and subject to changes. They can help improve performance of multithread applications.

We recommend altering these parameters in a controlled environment until reaching the best performance tuning.

XLIO_RING_MIGRATION_RATIO_TX

XLIO_RING_MIGRATION_RATIO_RX

Ring migration ratio is used with the "ring per thread" logic in order to decide when it is beneficial to replace the socket's ring with the ring allocated for the current thread.

Each XLIO_RING_MIGRATION_RATIO iteration (of accessing the ring), the current thread ID is checked to see whether the ring matches the current thread.If not, ring migration is considered. If the ring continues to be accessed from the same thread for a certain iteration, the socket is migrated to this thread ring.Use a value of -1 in order to disable migration.Default: -1

XLIO_RING_DEV_MEM_TX

XLIO can use the on-device-memory to store the egress packet if it does not fit into the BF inline buffer. This improves application egress latency by reducing the PCI transactions.

Using XLIO_RING_DEV_MEM_TX, enables the user to set the amount of the on-device-memory buffer allocated for each TX ring.The total size of the on-device-memory is limited to 256k for a single port HCA and to 128k for dual port HCA .Default value is 0

XLIO_TCP_CC_ALGO

TCP congestion control algorithm.

The default algorithm coming with LWIP is a variation of Reno/New-Reno.The new Cubic algorithm was adapted from FreeBSD implementation.Use value of 0 for LWIP algorithm.Use value of 1 for the Cubic algorithm.Use value of 2 in order to disable the congestion algorithm.Default: 0 (LWIP).

© Copyright 2025, NVIDIA. Last updated on Nov 26, 2025