NVIDIA Accelerated IO (XLIO) Documentation Rev 3.60

Advanced Features

The TLS hardware offload feature accelerates TLS encryption and decryption by leveraging NIC-level crypto capabilities.

Prerequisites

  • See System Requirements.

  • A crypto-enabled NIC (see supported adapter list).

  • Linux distribution with kTLS support.

    • TLS 1.3 HW offload requires OS support for kTLS 1.3.

  • A TLS library or application supporting kTLS.

    • Typically OpenSSL.

    • OpenSSL is also used for symmetric-encryption SW fallback.

      OpenSSL minimum versions used by XLIO:

      Feature

      Minimum OpenSSL Version

      TLS 1.2 / 1.3 TX Offload

      ≥ 3.0.0

      TLS 1.2 RX Offload

      ≥ 3.0.2

      TLS 1.3 RX Offload

      ≥ 3.2.0

      When building OpenSSL, ensure enable-ktls is set.

Usage

Enable kTLS in your application as you normally would when using kernel TLS.

XLIO transparently offloads the Linux kTLS API. For kTLS API details, refer to:

https://www.kernel.org/doc/html/latest/networking/tls.html

TLS HW offload may also be provided implicitly through TLS libraries with kTLS support (e.g., OpenSSL).

XLIO exposes configuration parameters:

  • XLIO_UTLS_TX – TX offload (enabled by default)

  • XLIO_UTLS_RX – RX offload (disabled by default)

Enable XLIO_UTLS_RX if receive-side kTLS offload is required.

Note: If TLS HW offload cannot be applied, setsockopt() returns ENOPROTOOPT.

Monitoring

TLS HW offload introduces new statistics counters. Their presence indicates that offload is configured and active.

Use xlio_stats -v3to view TLS counters for sockets and rings.

Example output:

Copy
Copied!
            

====================================================== Fd=[59] - TCP, Non-blocked - Local Address = [14.212.1.34:443] - Foreign Address = [14.212.1.57:49072] Tx Offload: 18511 / 39409 / 0 / 0 [kilobytes/packets/eagains/errors] Rx Offload: 1045354 / 2210387 / 0 / 1 [kilobytes/packets/eagains/errors] Rx byte: cur 0 / max 313 / dropped 0 / limit 0 Rx pkt : cur 0 / max 1 / dropped 0 TLS Offload: version 0303 / cipher 51 / TX On / RX On TLS Tx Offload: 17394 / 39407 [kilobytes/records] TLS Rx Offload: 982755 / 2210381 / 28 / 0 [kilobytes/records/encrypted/mixed] TLS Rx Resyncs: 1 [total] ====================================================== RING_ETH=[0] Tx Offload: 18519 / 39559 [kilobytes/packets] Rx Offload: 5080 / 39419 [kilobytes/packets] TLS TX Context Setups: 1 TLS RX Context Setups: 1 Interrupts: 39324 / 38656 [requests/received] Moderation: 1024 / 1024 [frames/usec period] ======================================================

Counter Definitions

  • TLS Offload (version) - 0303 = TLS 1.2, 0304 = TLS 1.3.

  • TLS Offload (cipher) - 51 = AES128-GCM, 52 = AES256-GCM.

  • TLS Offload (TX/RX) - Enabled/disabled state.

  • TLS Tx Offload (kilobytes) - Offloaded payload size (no headers/overhead).

  • TLS Tx Offload (records) - Number of TLS records created/queued.

  • TLS Tx Resyncs - Hardware resynchronizations due to out-of-order sends.

  • TLS Rx Offload (kilobytes) - TLS payload bytes received.

  • TLS Rx Offload (records) - Total received TLS records.

  • TLS Rx Offload (encrypted) - Encrypted records decrypted in software.

  • TLS Rx Offload (mixed) - Partially decrypted records handled by XLIO.

  • TLS Rx Resyncs - Hardware resynchronization events.

  • TLS TX Context Setups - Cumulative count of TX offload contexts (i.e., sockets).

  • TLS RX Context Setups - Same as above for RX.

Note: Kernel TLS counters do not increment when XLIO provides offload.

Supported Ciphers

The below table lists all the supported offloaded ciphers.

TLS Version

Bits

Hardware Offload

OpenSSL Name

XLIO Support

TX

RX

1.2

128

TLS1.2-AES128-GCM

AES128-GCM-SHA256

YES

YES

ECDHE-ECDSA-AES128-GCM-SHA256

YES

YES

ECDHE-RSA-AES128-GCM-SHA256

YES

YES

256

TLS1.2-AES256-GCM

AES256-GCM-SHA384

YES

YES

ECDHE-ECDSA-AES256-GCM-SHA384

YES

YES

ECDHE-RSA-AES256-GCM-SHA384

YES

YES

1.3

128

TLS1.3-AES128-GCM

TLS_AES_128_GCM_SHA256

YES

YES

256

TLS1.3-AES256-GCM

TLS_AES_256_GCM_SHA384

YES

YES


XLIO supports hardware timestamping for UDP-RX only when using PTP.

When running a PTP daemon, XLIO periodically retrieves kernel time-conversion parameters and combines them with NIC hardware timestamps to deliver synchronized time.

Prerequisites

  • A NIC supporting hardware clock.

  • Set:

    XLIO_HW_TS_CONVERSION=4

Usage

Enable hardware RX timestamps:

Copy
Copied!
            

uint8_t val = SOF_TIMESTAMPING_RX_HARDWARE setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));

Then run the application with:

Copy
Copied!
            

XLIO_HW_TS_CONVERSION=4

Example:

Copy
Copied!
            

Serve $ sudo LD_PRELOAD=libxlio.so XLIO_HW_TS_CONVERSION=4 ./timestamping <iface> SOF_TIMESTAMPING_RAW_HARDWARE SOF_TIMESTAMPING_RX_HARDWARE Client $ LD_PRELOAD=libxlio.so sockperf tp -i <server-ip> -t 3600 -p 6666 --mps 10 timestamping output: SOL_SOCKET SO_TIMESTAMPING SW 0.000000000 HW raw 1497823023.070846953 IP_PKTINFO interface index 8 SOL_SOCKET SO_TIMESTAMPING SW 0.000000000 HW raw 1497823023.170847260 IP_PKTINFO interface index 8 SOL_SOCKET SO_TIMESTAMPING SW 0.000000000 HW raw 1497823023.270847093 IP_PKTINFO interface index 8


PCI transactions between system RAM and the NIC introduce ~300 ns of latency per transfer, increasing with buffer size. Reducing the number of PCI fetches on the send path improves application egress latency.

XLIO optimizes this by:

  • Copying the WQE directly into the doorbell.

  • Inlining packets smaller than 190 bytes into the WQE, eliminating an additional PCI gather.

  • Storing larger packets in on-device memory instead of system RAM, reducing PCI traffic.

On-device memory is fully managed by XLIO and transparent to the application.

Total available device memory:

  • 256 KB on single-port NICs

  • 128 KB on dual-port NICs

You can configure the per-TX-ring allocation using:

Copy
Copied!
            

XLIO_RING_DEV_MEM_TX=<bytes>

Prerequisites

  • Supported NIC:

    NVIDIA ConnectX-6 Dx, ConnectX-7, BlueField-2, BlueField-3, or newer

  • Protocol: Ethernet

  • Environment variable:

    XLIO_RING_DEV_MEM_TX set according to application needs

Usage

Configure XLIO with an appropriate XLIO_RING_DEV_MEM_TX value to define how much on-device memory each TX ring can use.

For example:

Copy
Copied!
            

LD_PRELOAD=libxlio.so XLIO_RING_DEV_MEM_TX=16384 <application>

No application code changes are required - XLIO manages the memory internally.

Verifying Hardware Capability

Run XLIO with DEBUG trace level:

Copy
Copied!
            

XLIO_TRACELEVEL=DEBUG LD_PRELOAD=<path to libxlio.so> <command line>

Look for a positive on_device_memory value in the device capabilities printout:

For example:

Copy
Copied!
            

on_device_memory: 131072

This confirms that the NIC supports on-device memory and reports the size in bytes.

Monitoring

Use xlio_stats to display runtime on-device memory usage:

Copy
Copied!
            

xlio_stats –p <pid> -v 3

Example:

Copy
Copied!
            

====================================================== RING_ETH=[0] Tx Offload: 858931 / 3402875 [kilobytes/packets] Rx Offload: 865251 / 3402874 [kilobytes/packets] Dev Mem Alloc: 16384 Dev Mem Stats: 739074 / 1784935 / 0 [kilobytes/packets/oob] ======================================================

  • Dev Mem Alloc - allocated device memory for this ring

  • Dev Mem Stats - usage counters: kilobytes / packets / out-of-buffer (oob)

To enable TCP_QUICKACK thresholding:

  1. Modify TCP_QUICKACK_THRESHOLD in lwip/opt.h.

  2. Recompile XLIO.

When TCP_QUICKACK is enabled, ACKs are sent immediately. This can delay processing of incoming packets.

The threshold disables QUICKACK for packets larger than the defined size.

It is effective only when QUICKACK is active via setsockopt() or XLIO_TCP_QUICKACK.

Disabled by default.

The XLIO daemon monitors TCP socket state for XLIO processes that exit ungracefully.

Usage instructions are in Installing XLIO via Dedicated Packages (Binary).

XLIO can accelerate NGINX by using hardware-offloaded IO paths.

Prerequisites

For kTLS usage, see: Supported Ciphers section.

Limitations

XLIO does not support running in daemon mode.

daemon off; must remain set.

NGINX Best Practices

Configuration

In nginx.conf:

Copy
Copied!
            

worker_processes <NUM-WORKERS>; daemon off;

XLIO configuration:

Copy
Copied!
            

XLIO_NGINX_WORKERS_NUM=<NUM-WORKERS> XLIO_SPEC=<SPEC>

  • x86: XLIO_SPEC=nginx

  • BlueField DPU: XLIO_SPEC=nginx_dpu

© Copyright 2025, NVIDIA. Last updated on Nov 26, 2025