NVIDIA Messaging Accelerator (VMA) Documentation Rev 9.8.60

XLIO Parameters

XLIO configuration is performed using environment variables. For the full list of XLIO parameters, please see libxlio README file.

Note

XLIO parameters must be set prior to loading the application with XLIO. You can set the parameters in a system file, which can be run manually or automatically.

All the parameters have defaults that can be modified.

On default startup, the XLIO library prints the XLIO version information, as well as the configuration parameters being used and their values to stderr.

XLIO always logs the values of the following parameters, even when they are equal to the default value:

  • XLIO_TRACELEVEL

  • XLIO_LOG_FILE

For all other parameters, XLIO logs the parameter values only when they are not equal to the default value.

The following table lists configuration parameters which can be used to tune XLIO performance for more specific use cases.

XLIO Configuration Parameter

Description and Examples

XLIO_TSO

With Segmentation Offload, or TCP Large Send, TCP can pass a buffer to be

transmitted that is bigger than the maximum transmission unit (MTU) supported

by the medium. Intelligent adapters implement large sends by using the

prototype TCP and IP headers of the incoming send buffer to carve out segments

of required size. Copying the prototype header and options, then calculating

the sequence number and checksum fields creates TCP segment headers.

Expected benefits: Throughput increase and CPU unload.

Default value: auto (Depends on ethtool setting and adapter ability.

See ethtool -k <eth0> | grep tcp-segmentation-offload)

Set XLIO_TSO=1 to ensure Segmentation Offload is on for TX throughput oriented applications.

XLIO_LRO

Large receive offload (LRO) is a technique for increasing inbound throughput of

high-bandwidth network connections by reducing central processing unit (CPU)

overhead. It works by aggregating multiple incoming packets from a single stream

into a larger buffer before they are passed higher up the networking stack,

thus reducing the number of packets that must be processed.

Default value: auto (Depends on ethtool setting and adapter ability.

See ethtool -k <eth0> | grep large-receive-offload)

Set XLIO_LRO=1 to ensure LRO is turned on for RX throughput oriented applications.

XLIO_CQ_POLL_BATCH_MAX

Max size of the array while polling the RX CQs in XLIO.

Default value is 16

XLIO_GRO_STREAMS_MAX

Control the number of TCP streams to perform GRO (generic receive offload) simultaneously.

Disable GRO with a value of 0.

A GRO session is flushed after each RX poll cycle. See XLIO_CQ_POLL_BATCH_MAX.

Default value is 32

XLIO_SELECT_POLL

The duration in micro-seconds (usec) in which to poll the hardware on Rx path before

going to sleep (pending an interrupt blocking on OS select(), poll() or epoll_wait().

The max polling duration will be limited by the timeout the user is using when

calling select(), poll() or epoll_wait().

When select(), poll() or epoll_wait() path has successful receive poll hits

(see performance monitoring) the latency is improved dramatically. This comes

on account of CPU utilization.

Value range is -1, 0 to 100,000,000

Where value of -1 is used for infinite polling

Where value of 0 is used for no polling (interrupt driven)

Default value is 100000

XLIO_SELECT_POLL_OS_RATIO

This will enable polling of the OS file descriptors while user thread calls

select() or poll() and the XLIO is busy in the offloaded sockets polling loop.

This will result in a single poll of the not-offloaded sockets every

XLIO_SELECT_POLL offloaded sockets (CQ) polls.

When disabled, only offloaded sockets are polled.

(See XLIO_SELECT_POLL for more info)

Disable with 0

Default value is 10

XLIO_TCP_QUICKACK

If set, disable delayed acknowledge ability.

This means that TCP responds after every packet.

For more information on TCP_QUICKACK flag refer to TCP manual page.

Valid Values are:

Use value of 0 to disable.

Use value of 1 for enable.

Default value is Disabled.

XLIO_TCP_SEND_BUFFER_SIZE

TCP send buffer size.

Default value is 1MB.

XLIO_TX_BUF_SIZE

Size of Tx data buffer elements allocation.

Can not be less then MTU (Maximum Transfer Unit) and greater than 0xFF00.

Default value is calculated basing on XLIO_MTU and XLIO_MSS.

XLIO_RX_POLL_ON_TX_TCP

This parameter enables/disables TCP RX polling during TCP TX operation for faster

TCP ACK reception.

Default: 0 (Disabled)

XLIO_SKIP_POLL_IN_RX

Allow TCP socket to skip CQ polling in rx socket call.

0 - Disabled

1 - Skip always

2 - Skip only if this socket was added to epoll before

Default: 0 (Disabled)

XLIO_SPEC

XLIO predefined specification profiles.

latency

Optimized for use cases that are keen on latency.

Example: XLIO_SPEC=latency

XLIO_MULTILOCK

Control locking type mechanism for some specific flows.

Note that usage of Mutex might increase latency.

0 - Spin

1 - Mutex

Default: 0 (Spin)

XLIO_TCP_2T_RULES

Use only 2 tuple rules for TCP connections, instead of using 5 tuple rules.

This can help to overcome steering limitations for outgoing TCP connections.

However, this option requires a unique local IP address per XLIO ring. In

the default ring per thread configuration, this means that each thread must

bind its sockets to a thread local IP address.

Default: 0 (Disabled)

XLIO_RX_CQ_WAIT_CTRL

In scenarios of high scale of non blocking sockets in event driven usage such as epoll/poll/select

turning on this parameter (XLIO_RX_CQ_WAIT_CTRL=1) avoids high CPU usage inside the Kernel while

processing thread wakeup.

The following table lists configuration parameters and their possible values for new XLIO Beta level features. The parameters below are disabled by default.

These XLIO features are still experimental and subject to changes. They can help improve performance of multithread applications.

We recommend altering these parameters in a controlled environment until reaching the best performance tuning.

XLIO Configuration Parameter

Description and Examples

XLIO_RING_MIGRATION_RATIO_TX

XLIO_RING_MIGRATION_RATIO_RX

Ring migration ratio is used with the "ring per thread" logic in order to decide when it is beneficial to replace the socket's ring with the ring allocated for the current thread.

Each XLIO_RING_MIGRATION_RATIO iteration (of accessing the ring), the current thread ID is checked to see whether the ring matches the current thread.

If not, ring migration is considered. If the ring continues to be accessed from the same thread for a certain iteration, the socket is migrated to this thread ring.

Use a value of -1 in order to disable migration.

Default: -1

XLIO_RING_DEV_MEM_TX

XLIO can use the on-device-memory to store the egress packet if it does not fit into the BF inline buffer. This improves application egress latency by reducing the PCI transactions.

Using XLIO_RING_DEV_MEM_TX, enables the user to set the amount of the on-device-memory buffer allocated for each TX ring.

The total size of the on-device-memory is limited to 256k for a single port HCA and to 128k for dual port HCA.

Default value is 0

XLIO_TCP_CC_ALGO

TCP congestion control algorithm.

The default algorithm coming with LWIP is a variation of Reno/New-Reno.

The new Cubic algorithm was adapted from FreeBsd implementation.

Use value of 0 for LWIP algorithm.

Use value of 1 for the Cubic algorithm.

Use value of 2 in order to disable the congestion algorithm.

Default: 0 (LWIP).

© Copyright 2025, NVIDIA. Last updated on Feb 13, 2025.