NVIDIA Accelerated IO (XLIO) Documentation Rev 3.60

XLIO - Configuration Reference

This document provides a comprehensive reference for all XLIO configuration parameters organized by functional categories.

Parameter

Description

Deprecated Environment Variable

Default

Values/Examples/Notes

acceleration_control.app_id

Specify a group of rules from libxlio.conf for XLIO to apply

XLIO_APPLICATION_ID

XLIO_DEFAULT_APPLICATION_ID (matches only the * group rule)

Example: acceleration_control.app_id=iperf_server

acceleration_control.default_acceleration

Create all sockets as offloaded or not offloaded by default

XLIO_OFFLOADED_SOCKETS

true (Enabled)

Values:

  • true= offloaded

  • false = not offloaded

acceleration_control.rules

Defines transport protocol and offload settings for specific applications or processes. Maps to configuration in libxlio.conf.

-

[]

Note: rules is an array of objects with id, name, and actions

Example:{ "acceleration_control": { "rules": [{"id": "A1","name": "nginx","actions": ["use xlio tcp_server *:8080"]}] } }

acceleration_control.rules[].id

Unique identifier for this transport control rule

-

-

-

acceleration_control.rules[].name

Name of the application this rule applies to

-

-

-

acceleration_control.rules[].actions

Action directives that modify transport behavior

-

-

Format: use <transport> <role> <address| >:<port range| >

See Descriptions table.

Parameter

Description

Deprecated Environment Variable

Default

Values/Examples/Notes

applications.nginx.distribute_cq

Distributes completion queue (CQ) processing across NGINX worker processes to improve performance

XLIO_DISTRIBUTE_CQ

false (Disabled)

Helps balance CQ handling among worker threads for higher throughput

applications.nginx.src_port_stride

Controls how source ports are distributed across NGINX worker processes

XLIO_NGINX_SRC_PORT_STRIDE

2

Determines port stepping between workers; useful for load balancing

applications.nginx.udp_pool_size

Defines the size of the UDP socket pool for NGINX. When set >0, a closed UDP socket is returned to the pool instead of being destroyed

XLIO_NGINX_UDP_POOL_SIZE

0 (Disabled)

Enables reuse of UDP sockets to reduce allocation overhead

applications.nginx.udp_socket_pool_reuse

Controls reuse of UDP socket pools for NGINX deployments

XLIO_NGINX_UDP_POOL_RX_NUM_BUFFS_REUSE

0 (Disabled)

Improves efficiency in UDP-heavy traffic patterns

applications.nginx.workers_num

Number of NGINX worker processes to optimize for. Must be set to offload NGINX successfully

XLIO_NGINX_WORKERS_NUM

0

Required for enabling NGINX offload support

Parameter

Description

Deprecated Environment Variable

Default

Values/Examples/Notes

core.daemon.dir

Directory path for XLIO to write files used by xliod

XLIO_SERVICE_NOTIFY_DIR

/tmp/xlio

When used, xliod must be run with --notify-dir pointing to the same folder

core.daemon.enable

Enable the XLIO daemon service for additional monitoring capabilities

XLIO_SERVICE_ENABLE

false (Disabled)

-

core.exception_handling.mode

Mode for handling missing support or error cases in the Socket API or other XLIO functionality

XLIO_EXCEPTION_HANDLING

-1 (default; future default 0)

  • -2/exit – exit on startup failure

  • -1/handle_debug – handle at DEBUG level

  • 0/log_debug_undo_offload – log DEBUG and recover via kernel stack

  • 1/log_error_undo_offload – log ERROR and recover via kernel stack

  • 2/log_error_return_error – log ERROR and return error code

  • 3/log_error_abort – log ERROR and abort (throw xlio_error)

core.quick_init

Avoid extra checks to reduce initialization time (may fail under system misconfiguration)

XLIO_QUICK_START

false (Disabled)

Note: If enabled and hugepages are requested beyond the cgroup limit, XLIO may crash

core.resources.external_memory_limit

Memory limit for external user allocator (0 uses core.resources.memory_limit value)

XLIO_MEMORY_LIMIT_USER

0

Supports suffixes: B, KB, MB, GB

core.resources.heap_metadata_block_size

Size of metadata block added to every heap allocation

XLIO_HEAP_METADATA_BLOCK

32 MB

Supports suffixes: B, KB, MB, GB

core.resources.hugepages.enable

Use huge pages for data buffers to improve performance by reducing TLB misses; overrides rdma-core parameters MLX_QP_ALLOC_TYPE and MLX_CQ_ALLOC_TYPE

XLIO_MEM_ALLOC_TYPE

true (Enabled)

  • false = malloc

  • true = huge pages

core.resources.hugepages.size

Force specific hugepage size for internal allocations; 0 allows any supported

XLIO_HUGEPAGE_SIZE

0

Must be power of 2 or 0.Suffixes allowed: KB, MB, GB

core.resources.memory_limit

Pre-allocated memory limit for buffers. Dynamic allocations may exceed this. 0 = unlimited

XLIO_MEMORY_LIMIT

2048 MB (2 GB)

Supports suffixes: B, KB, MB, GB

core.signals.sigint.exit

Call XLIO handler on SIGINT and then application's handler (if exists)

XLIO_HANDLE_SIGINTR

true (Enabled)

-

core.signals.sigsegv.backtrace

Print backtrace when a segmentation fault occurs

XLIO_HANDLE_SIGSEGV

false (Disabled)

-

core.syscall.allow_privileged_sockopt

Permit use of privileged socket options that may require special permissions

XLIO_ALLOW_PRIVILEGED_SOCK_OPT

true (Enabled)

-

core.syscall.avoid_ctl_syscalls

For TCP FDs, avoid system calls for supported options (ioctl, fcntl, getsockopt, setsockopt)

XLIO_AVOID_SYS_CALLS_ON_TCP_FD

false (Disabled)

Unsupported options fallback to OS

core.syscall.deferred_close

Defer closing file descriptors until the socket is actually closed (useful in multithreaded apps)

XLIO_DEFERRED_CLOSE

false (Disabled)

-

core.syscall.dup2_close_fd

Handle dup2() by treating the old FD as closed before forwarding call to OS

XLIO_CLOSE_ON_DUP2

true (Enabled)

Rudimentary dup2 support (for FD replacement only)

core.syscall.fork_support

Enable ibv_fork_init() to correctly handle fork()

XLIO_FORK

true (Enabled)

-

core.syscall.getsockname_dummy_send

Trigger dummy packet send from getsockname() to warm caches

XLIO_TRIGGER_DUMMY_SEND_GETSOCKNAME

false (Disabled)

-

core.syscall.sendfile_cache_limit

Memory limit for mapping cache used by sendfile()

XLIO_ZC_CACHE_THRESHOLD

10 GB

Supports suffixes: B, KB, MB, GB

Parameter

Description

Deprecated Environment Variable

Default

Values/Examples/Notes

hardware_features.striding_rq.enable

Enable or disable Striding Receive Queues (each WQE in a Striding RQ can receive multiple packets)

XLIO_STRQ

true (Enabled)

The WQE buffer size is determined by hardware_features.striding_rq.strides_num × hardware_features.striding_rq.stride_size

hardware_features.striding_rq.stride_size

Size in bytes of each stride in a receive WQE; must be a power of two and within [64–8192]

XLIO_STRQ_STRIDE_SIZE_BYTES

64

Range: 64–8192 (power of 2)

hardware_features.striding_rq.strides_num

Number of strides in each receive WQE; must be a power of two and within [512–65536]

XLIO_STRQ_NUM_STRIDES

2048

Range: 512–65536 (power of 2)

hardware_features.tcp.lro

Large Receive Offload (LRO): increases inbound throughput by reducing CPU overhead via packet aggregation

XLIO_LRO

auto (-1)

  • auto/-1 – depends on ethtool & adapter

  • enable/1 – enabled if adapter supports it

  • disable/0 – disabled

hardware_features.tcp.tls_offload.dek_cache_max_size

Maximum Data Encryption Key (DEK) cache size for TLS offload

XLIO_HIGH_WMARK_DEK_CACHE_SIZE

1024

-

hardware_features.tcp.tls_offload.dek_cache_min_size

Minimum DEK cache size for TLS offload

XLIO_LOW_WMARK_DEK_CACHE_SIZE

512

-

hardware_features.tcp.tls_offload.rx_enable

Offload TLS RX path through kTLS API if possible (uses UTLS for acceleration)

XLIO_UTLS_RX

false (Disabled)

-

hardware_features.tcp.tls_offload.tx_enable

Offload TLS TX path through kTLS API if possible (uses UTLS for acceleration)

XLIO_UTLS_TX

true (Enabled)

-

hardware_features.tcp.tso.enable

TCP Segmentation Offload (TSO): allows TCP to transmit buffers larger than the MTU using adapter segmentation

XLIO_TSO

auto (-1)

  • auto/-1 – depends on ethtool & adapter

  • enable/1 – enabled if supported

  • disable/0 – disabled

hardware_features.tcp.tso.max_size

Maximum TCP segment size (in bytes) allowed with TSO

XLIO_TSO_MAX_SIZE

262144 (256 KB)

Supports suffixes: B, KB, MB, GB

Parameter

Description

Deprecated Environment Variable

Default

Values/Examples/Notes

monitor.exit_report

Print a human-readable report of resource usage at process exit. Printed during termination and may be missed if process ends with SIGKILL.

XLIO_PRINT_REPORT

auto (-1)

  • auto/-1 – print report only if anomaly detected

  • on/1 – always print

  • off/0 – never print

monitor.log.colors

Use color scheme when logging: red for errors, purple for warnings, dim for low-level debug. Automatically disabled when logging to non-terminal devices.

XLIO_LOG_COLORS

true (Enabled)

-

monitor.log.details

Add details to each log line.

XLIO_LOG_DETAILS

0

  • 0 – Basic

  • 1 – ThreadId

  • 2 – ProcessId + ThreadId

  • 3 – Time + ProcessId + ThreadId (time in ms from process start)

monitor.log.file_path

Redirect all logging to a user-defined file. Library replaces a single %d with process PID for multiple instances.

XLIO_LOG_FILE

"" (empty)

Example: /tmp/xlio_log.txt

monitor.log.level

Logging verbosity level used by the library.

XLIO_TRACELEVEL

info (3)

  • init/-2 or none/-2 – no logs

  • panic/-1 – fatal errors

  • error/0 – runtime errors

  • warn/1 – warnings

  • info/3 – general information

  • details/4 – configuration info

  • debug/5 – high-level debug (logs all socket API calls)

  • fine/6 – low-level runtime logging

  • finer/7 or all/8 – very detailed logging (significant performance cost)

monitor.stats.cpu_usage

Calculate XLIO CPU usage during polling hardware loops. Results accessible via the XLIO stats utility.

XLIO_CPU_USAGE_STATS

false (Disabled)

-

monitor.stats.fd_num

Maximum number of sockets monitored by XLIO statistics mechanism. Affects how many sockets xlio_stats and XLIO_STATS_FILE can report.

XLIO_STATS_FD_NUM

0

Range: 0–1024Tool limited to 1024 sockets

monitor.stats.file_path

Redirect socket statistics to a specific file. Each socket’s stats are dumped on close.

XLIO_STATS_FILE

"" (empty)

Example: /tmp/stats

monitor.stats.shmem_dir

Directory path for creating shared-memory files for xlio_stats. No files created if empty string.

XLIO_STATS_SHMEM_DIR

/tmp/xlio

-

Buffers

Parameter

Description

Deprecated Environment Variable

Default

Values/Examples/Notes

performance.buffers.batching_mode

Controls batching of returning Rx buffers and pulling Tx buffers per socket

XLIO_BUFFER_BATCHING_MODE

enable_and_reuse (1)

  • disable/0 – no batching

  • enable_and_reuse/1 – batching with periodic reclaim of unused buffers

  • enable/2 – batching without reclaim

performance.buffers.rx.buf_size

Size of Rx buffer allocation; must be ≥ MTU and ≤ 0xFF00. Default based on max MTU.

XLIO_RX_BUF_SIZE

0

Range: 0–65280

Supports suffixes: B, KB, MB, GB

performance.buffers.rx.prefetch_before_poll

Prefetch before polling for packets, improves latency in low PPS traffic

XLIO_RX_PREFETCH_BYTES_BEFORE_POLL

0

-

performance.buffers.rx.prefetch_size

Bytes prefetched into cache during ingress packet processing

XLIO_RX_PREFETCH_BYTES

256

Range: 32–MTU

performance.buffers.tcp_segments.pool_batch_size

TCP segments batched when fetched from the segment pool

XLIO_TX_SEGS_POOL_BATCH_TCP

16384

Minimum: 1

performance.buffers.tcp_segments.ring_batch_size

TCP segments fetched per ring from the segment pool

XLIO_TX_SEGS_RING_BATCH_TCP

1024

Minimum: 1

performance.buffers.tcp_segments.socket_batch_size

TCP segments fetched per socket from the segment pool

XLIO_TX_SEGS_BATCH_TCP

64

Minimum: 1

performance.buffers.tx.buf_size

Size of Tx buffer allocation; must be ≥ MTU and ≤ 0xFF00. Default based on MTU/MSS.

XLIO_TX_BUF_SIZE

0

Range: 0–262144

Supports suffixes: B, KB, MB, GB

performance.buffers.tx.global_array_size

Number of global zero-copy Tx buffers preallocated

XLIO_TX_BUFS

200000

-

performance.buffers.tx.prefetch_size

Cache prefetch size for Tx path to optimize send rate

XLIO_TX_PREFETCH_BYTES

256

Range: 0–MTU


Completion Queue

Parameter

Description

Deprecated Environment Variable

Default

Values/Examples/Notes

performance.completion_queue.interrupt_moderation.adaptive_change_frequency_msec

Frequency of interrupt moderation adaptation. Interval in milliseconds between adaptation attempts. Use 0 to disable adaptive interrupt moderation

XLIO_CQ_AIM_INTERVAL_MSEC

1000

-

performance.completion_queue.interrupt_moderation.adaptive_count

Maximum count value to use in the adaptive interrupt moderation algorithm

XLIO_CQ_AIM_MAX_COUNT

500

-

performance.completion_queue.interrupt_moderation.adaptive_interrupt_per_sec

Desired interrupts rate per second for each ring (CQ). Count and period parameters will change automatically to achieve the desired rate

XLIO_CQ_AIM_INTERRUPTS_RATE_PER_SEC

10000

-

performance.completion_queue.interrupt_moderation.adaptive_period_usec

Maximum period value to use in the adaptive interrupt moderation algorithm

XLIO_CQ_AIM_MAX_PERIOD_USEC

1000

-

performance.completion_queue.interrupt_moderation.enable

Enable CQ interrupt moderation. When enabled, hardware only generates an interrupt after some packets are received or after a packet was held for some time

XLIO_CQ_MODERATION_ENABLE

true (Enabled)

-

performance.completion_queue.interrupt_moderation.packet_count

Number of packets to hold before generating interrupt

XLIO_CQ_MODERATION_COUNT

48

-

performance.completion_queue.interrupt_moderation.period_usec

Period in microseconds for holding the packet before generating interrupt

XLIO_CQ_MODERATION_PERIOD_USEC

50

-

performance.completion_queue.keep_full

If disabled, CQ will not try to compensate for each poll on the receive path. Uses a "debt" to remember missing WREs. If enabled, CQ will try to compensate QP for each polled receive completion

XLIO_CQ_KEEP_QP_FULL

true (Enabled)

-

performance.completion_queue.periodic_drain_max_cqes

Each time XLIO's internal thread starts CQ draining, it will stop when it reaches this max value. Applications are not limited by this value

XLIO_PROGRESS_ENGINE_WCE_MAX

10000

-

performance.completion_queue.periodic_drain_msec

XLIO internal thread safe check that the CQ is drained at least once every N milliseconds. Allows library to progress TCP stack when application doesn't access socket

XLIO_PROGRESS_ENGINE_INTERVAL

10

-

performance.completion_queue.rx_drain_rate_nsec

Socket's receive path CQ drain logic rate control. When enabled, socket will check CQ for ready completions even if receive ready packet queue is not empty

XLIO_RX_CQ_DRAIN_RATE_NSEC

0

Recommended: 100–5000 (nsec)


Polling

Parameter

Description

Deprecated Environment Variable

Default

Values/Examples/Notes

performance.polling.blocking_rx_poll_usec

Number of times to poll on Rx path for ready packets before going to sleep or returning -1. Done when application uses direct blocked calls to read(), recv(), etc.

XLIO_RX_POLL

100000

Range: -1 (infinite), 0 (interrupt-driven), 1–100000000

performance.polling.iomux.poll_os_ratio

Enables polling of OS file descriptors while user thread calls select() or poll(). Results in single poll of not-offloaded sockets every N offloaded sockets polls

XLIO_SELECT_POLL_OS_RATIO

10

-

performance.polling.iomux.poll_usec

Duration in microseconds to poll the hardware on Rx path before going to sleep. Max polling duration limited by timeout used in select(), poll() or epoll_wait()

XLIO_SELECT_POLL

100000

Range: -1 (infinite), 0 (interrupt-driven), 1–100000000

performance.polling.iomux.skip_os

For select() or poll() this forces XLIO to check the non-offloaded fd even though an offloaded socket has ready packets found while polling

XLIO_SELECT_SKIP_OS

4

-

performance.polling.kernel_fd_attention_level

Controls threshold for checking kernel file descriptors during polling. 0 means never check. Affects how often XLIO checks for activity on non-offloaded kernel file descriptors

XLIO_RING_KERNEL_FD_ATTENTION_LEVEL

10

-

performance.polling.max_rx_poll_batch

Maximum number of receive buffers processed in a single poll operation. Max size of array while polling the CQs

XLIO_CQ_POLL_BATCH_MAX

16

-

performance.polling.nonblocking_eagain

Return value 'OK' on all send operations done on non-blocked UDP sockets (OS default). When enabled, library will return with error EAGAIN if unable to accomplish send operation

XLIO_TX_NONBLOCKED_EAGAINS

false (Disabled)

-

performance.polling.offload_transition_poll_count

Controls polling count during transition phase where socket is UDP unicast and no multicast addresses were added. Once first ADD_MEMBERSHIP is called, RX poll duration setting takes effect

XLIO_RX_POLL_INIT

0

Range: -1 (infinite),0 (disabled),1–100000000

performance.polling.rx_cq_wait_ctrl

Ensures FDs are added only to sleeping sockets' epoll descriptors, reducing kernel scan overhead

XLIO_RX_CQ_WAIT_CTRL

false (Disabled)

-

performance.polling.rx_kernel_fd_attention_level

Ratio between XLIO CQ poll and OS FD poll. 0 means only poll offloaded sockets. Results in single poll of not-offloaded sockets every N offloaded socket polls

XLIO_RX_UDP_POLL_OS_RATIO

100

-

performance.polling.rx_poll_on_tx_tcp

Enables/disables TCP RX polling during TCP TX operation for faster TCP ACK reception

XLIO_RX_POLL_ON_TX_TCP

false (Disabled)

-

performance.polling.skip_cq_on_rx

Allow TCP socket to skip CQ polling in rx socket call

XLIO_SKIP_POLL_IN_RX

0

  • 0 – Disabled

  • 1 – Skip always

  • 2 – Skip only if socket was added to epoll before

performance.polling.yield_on_poll

When application runs with multiple threads on limited cores, each thread polling inside XLIO needs to yield CPU to other polling threads to prevent starvation. The value is the number of iterations before yielding the CPU

XLIO_RX_POLL_YIELD

0 (Disabled)

-


Rings

Parameter

Description

Deprecated Environment Variable

Default

Values/Examples/Notes

performance.rings.max_per_interface

Limit on rings per interface. If number of sockets using same interface is larger than limit, several sockets will share the same ring. Use 0 for unlimited

XLIO_RING_LIMIT_PER_INTERFACE

0

-

performance.rings.rx.allocation_logic

Controls how reception rings are allocated and separated. By default all sockets use the same ring for both RX and TX over the same interface

XLIO_RING_ALLOCATION_LOGIC_RX

per_thread (20)

  • per_interface/0 – Ring per interface

  • per_ip_address/1 – Ring per IP address

  • per_socket/10 – Ring per socket

  • per_thread/20 – Ring per thread

  • per_cpuid/30 – Ring per core (using cpu id)

  • per_core/31 – Ring per core – attach threads

performance.rings.rx.migration_ratio

Controls when to replace a socket's ring with the current thread's ring. Used with "ring per thread" logic to decide when ring migration is beneficial

XLIO_RING_MIGRATION_RATIO_RX

-1 (disabled)

-

performance.rings.rx.post_batch_size

Number of Work Request Elements and RX buffers to batch before recycling. Batching decreases latency mean but might increase latency STD

XLIO_RX_WRE_BATCHING

1024

Range: 1–1024

performance.rings.rx.ring_elements_count

Number of Work Request Elements allocated in all RQs. Default value is 128 for hardware_features.striding_rq.enable=true (default) or 32768 for hardware_features.striding_rq.enable=false

XLIO_RX_WRE

32768

-

performance.rings.rx.spare_buffers

Number of spare receive buffers a ring holds to allow for filling up QP while full receive buffers are being processed. Default value is 128 for hardware_features.striding_rq.enable=true (default) or 32768 for hardware_features.striding_rq.enable=false

XLIO_QP_COMPENSATION_LEVEL

32768

-

performance.rings.rx.spare_strides

Number of spare stride objects a ring holds to allow faster allocation of a stride object when a packet arrives

XLIO_STRQ_STRIDES_COMPENSATION_LEVEL

32768

-

performance.rings.tx.allocation_logic

Ring allocation logic is used to separate traffic to different rings. By default all sockets use the same ring for both RX and TX over the same interface

XLIO_RING_ALLOCATION_LOGIC_TX

per_thread (20)

  • per_interface/0 – Ring per interface

  • per_ip_address/1 – Ring per IP address

  • per_socket/10 – Ring per socket

  • per_thread/20 – Ring per thread

  • per_cpuid/30 – Ring per core (using cpu id)

  • per_core/31 – Ring per core – attach threads

performance.rings.tx.completion_batch_size

Number of TX WREs used until a completion signal is requested. Allows better control of jitter from Tx CQE handling

XLIO_TX_WRE_BATCHING

64

Range: 1–64

performance.rings.tx.max_inline_size

Maximum data size sent inline. Setting to 0 disables inlining. Data copied into INLINE space is at least 32 bytes of headers plus user datagram payload

XLIO_TX_MAX_INLINE

204

Range: 0–884

performance.rings.tx.max_on_device_memory

Maximum On Device Memory buffer size for each TX ring. 0 means unlimited. XLIO can use the On Device Memory to store the egress packet if it does not fit into the BF inline buffer

XLIO_RING_DEV_MEM_TX

0

Range: 0–262144 KB

Note: Total On Device Memory limited to 256k for single-port HCA and 128k for dual-port HCA

performance.rings.tx.migration_ratio

Controls when to replace a socket's ring with the current thread's ring. Used with "ring per thread" logic to decide when ring migration is beneficial

XLIO_RING_MIGRATION_RATIO_TX

-1 (disabled)

-

performance.rings.tx.ring_elements_count

Number of Work Request Elements allocated in all transmit QPs. Number of QPs can change according to number of network offloaded interfaces

XLIO_TX_WRE

32768

-

performance.rings.tx.tcp_buffer_batch

Number of TX buffers fetched by a TCP socket at once. Higher number for less ring accesses to fetch buffers. Lower number for less memory consumption

XLIO_TX_BUFS_BATCH_TCP

16

Minimum: 1

performance.rings.tx.udp_buffer_batch

Number of TX buffers fetched by a UDP socket at once

TX_BUFS_BATCH_UDP

8

Minimum: 1


Steering Rules

Parameter

Description

Deprecated Environment Variable

Default

Values/Examples/Notes

performance.steering_rules.disable_flowtag

Disables flow tag functionality

XLIO_DISABLE_FLOW_TAG

false (Disabled)

-

performance.steering_rules.tcp.2t_rules

Use only 2-tuple rules for TCP connections instead of 5-tuple rules. Can help overcome steering limitations for outgoing TCP connections but requires unique local IP address per XLIO ring

XLIO_TCP_2T_RULES

false (Disabled)

-

performance.steering_rules.tcp.3t_rules

Use only 3-tuple rules for incoming TCP connections instead of 5-tuple rules. Can improve performance for servers with listen sockets accepting many connections

XLIO_TCP_3T_RULES

false (Disabled)

-

performance.steering_rules.udp.3t_rules

Relevant for connected UDP sockets. 3-tuple rules are used in hardware flow steering when enabled; 5-tuple when disabled. Enabling can reduce hardware flow steering resources

XLIO_UDP_3T_RULES

true (Enabled)

-

performance.steering_rules.udp.only_mc_l2_rules

Use only L2 rules for Ethernet Multicast. All loopback traffic will be handled by XLIO instead of OS

XLIO_ETH_MC_L2_ONLY_RULES

false (Disabled)

-


Threading

Parameter

Description

Deprecated Environment Variable

Default

Values/Examples/Notes

performance.threading.cpu_affinity

Control which CPU core(s) the XLIO internal thread is serviced on. Can be provided as hexadecimal bitmask or comma-delimited values/ranges

XLIO_INTERNAL_THREAD_AFFINITY

"-1" (disabled)

Examples:

  • 0x00000001 – Run on processor 0

  • 0x00000007 – Run on processors 1, 2, and 3

  • 0,4,8 – Run on processors 0, 4, and 8

  • 0,1,7-10 – Run on processors 0, 1, 7, 8, 9, and 10

Note: Only hexadecimal values are supported for this parameter in XLIO_INLINE_CONFIG

performance.threading.cpuset

Select a cpuset for XLIO internal thread. Value is path to cpuset or empty string to run on same cpuset as process

XLIO_INTERNAL_THREAD_CPUSET

"" (empty string)

Example: /dev/cpuset/my_set

performance.threading.internal_handler.behavior

Select which TCP control flows are done in the internal thread. Should be kept disabled if using blocking poll/select (epoll is OK)

XLIO_TCP_CTL_THREAD

disable (0)

  • disable/0 – Disable

  • delegate/1 – Handle TCP timers in application context threads

performance.threading.internal_handler.timer_msec

Control XLIO internal thread wakeup timer resolution (in milliseconds)

XLIO_TIMER_RESOLUTION_MSEC

10

-

performance.threading.internal_handler.wakeup_per_packet

Wake up the internal thread for each packet that the CQ receives. Can minimize latency for busy applications but might decrease performance for high PPS applications

XLIO_INTERNAL_THREAD_ARM_CQ

0 (Disabled)

-

performance.threading.mutex_over_spinlock

Control locking type mechanism for some specific flows. Note that usage of Mutex might increase latency

XLIO_MULTILOCK

false (Spin)

-

performance.threading.worker_threads

Controls which execution model and number of worker threads are used to handle networking and progress sockets. Two modes: Run to Completion (0) and Worker Threads (>0)

XLIO_WORKER_THREADS

0 (Run to Completion execution model)

Range: 0–512


Profiles

Parameter

Description

Deprecated Environment Variable

Default

Values/Examples/Notes

profiles.spec

XLIO predefined specification profiles

XLIO_SPEC

none (0)

  • none/0 – No profile appliedl

  • atency/1 – Optimized for latency-sensitive use cases

  • ultra_latency/2 – Optimized for ultra-low latency using single-threaded model; avoids OS polling and progress engine

  • nginx/3 – Optimized for nginx (must be used to offload nginx). This profile is turned indirectly by setting applications.nginx.workers_num=n

  • ginx_dpu/4 – Optimized for nginx running inside NVIDIA DPU

  • nvme_bf3/5 – Optimized for SPDK solution over NVIDIA DPU BF3

  • all/6 – Reserved

Examples:

  • profiles.spec=latency

  • profiles.spec=ultra_latency

  • profiles.spec=nginx_dpu applications.nginx.workers_num=<N>

  • profiles.spec=nvme_bf3


© Copyright 2025, NVIDIA. Last updated on Nov 26, 2025