Advanced Features
The TLS hardware offload feature accelerates TLS encryption and decryption by leveraging NIC-level crypto capabilities.
Prerequisites
See System Requirements.
A crypto-enabled NIC (see supported adapter list).
Linux distribution with kTLS support.
TLS 1.3 HW offload requires OS support for kTLS 1.3.
A TLS library or application supporting kTLS.
Typically OpenSSL.
OpenSSL is also used for symmetric-encryption SW fallback.
OpenSSL minimum versions used by XLIO:
Feature
Minimum OpenSSL Version
TLS 1.2 / 1.3 TX Offload
≥ 3.0.0
TLS 1.2 RX Offload
≥ 3.0.2
TLS 1.3 RX Offload
≥ 3.2.0
When building OpenSSL, ensure enable-ktls is set.
Usage
Enable kTLS in your application as you normally would when using kernel TLS.
XLIO transparently offloads the Linux kTLS API. For kTLS API details, refer to:
https://www.kernel.org/doc/html/latest/networking/tls.html
TLS HW offload may also be provided implicitly through TLS libraries with kTLS support (e.g., OpenSSL).
XLIO exposes configuration parameters:
XLIO_UTLS_TX – TX offload (enabled by default)
XLIO_UTLS_RX – RX offload (disabled by default)
Enable XLIO_UTLS_RX if receive-side kTLS offload is required.
Note: If TLS HW offload cannot be applied, setsockopt() returns ENOPROTOOPT.
Monitoring
TLS HW offload introduces new statistics counters. Their presence indicates that offload is configured and active.
Use xlio_stats -v3to view TLS counters for sockets and rings.
Example output:
======================================================
Fd=[59]
- TCP, Non-blocked
- Local Address = [14.212.1.34:443]
- Foreign Address = [14.212.1.57:49072]
Tx Offload: 18511 / 39409 / 0 / 0 [kilobytes/packets/eagains/errors]
Rx Offload: 1045354 / 2210387 / 0 / 1 [kilobytes/packets/eagains/errors]
Rx byte: cur 0 / max 313 / dropped 0 / limit 0
Rx pkt : cur 0 / max 1 / dropped 0
TLS Offload: version 0303 / cipher 51 / TX On / RX On
TLS Tx Offload: 17394 / 39407 [kilobytes/records]
TLS Rx Offload: 982755 / 2210381 / 28 / 0 [kilobytes/records/encrypted/mixed]
TLS Rx Resyncs: 1 [total]
======================================================
RING_ETH=[0]
Tx Offload: 18519 / 39559 [kilobytes/packets]
Rx Offload: 5080 / 39419 [kilobytes/packets]
TLS TX Context Setups: 1
TLS RX Context Setups: 1
Interrupts: 39324 / 38656 [requests/received]
Moderation: 1024 / 1024 [frames/usec period]
======================================================
Counter Definitions
TLS Offload (version) - 0303 = TLS 1.2, 0304 = TLS 1.3.
TLS Offload (cipher) - 51 = AES128-GCM, 52 = AES256-GCM.
TLS Offload (TX/RX) - Enabled/disabled state.
TLS Tx Offload (kilobytes) - Offloaded payload size (no headers/overhead).
TLS Tx Offload (records) - Number of TLS records created/queued.
TLS Tx Resyncs - Hardware resynchronizations due to out-of-order sends.
TLS Rx Offload (kilobytes) - TLS payload bytes received.
TLS Rx Offload (records) - Total received TLS records.
TLS Rx Offload (encrypted) - Encrypted records decrypted in software.
TLS Rx Offload (mixed) - Partially decrypted records handled by XLIO.
TLS Rx Resyncs - Hardware resynchronization events.
TLS TX Context Setups - Cumulative count of TX offload contexts (i.e., sockets).
TLS RX Context Setups - Same as above for RX.
Note: Kernel TLS counters do not increment when XLIO provides offload.
Supported Ciphers
The below table lists all the supported offloaded ciphers.
TLS Version | Bits | Hardware Offload | OpenSSL Name | XLIO Support | |
TX | RX | ||||
1.2 | 128 | TLS1.2-AES128-GCM | AES128-GCM-SHA256 | YES | YES |
ECDHE-ECDSA-AES128-GCM-SHA256 | YES | YES | |||
ECDHE-RSA-AES128-GCM-SHA256 | YES | YES | |||
256 | TLS1.2-AES256-GCM | AES256-GCM-SHA384 | YES | YES | |
ECDHE-ECDSA-AES256-GCM-SHA384 | YES | YES | |||
ECDHE-RSA-AES256-GCM-SHA384 | YES | YES | |||
1.3 | 128 | TLS1.3-AES128-GCM | TLS_AES_128_GCM_SHA256 | YES | YES |
256 | TLS1.3-AES256-GCM | TLS_AES_256_GCM_SHA384 | YES | YES | |
XLIO supports hardware timestamping for UDP-RX only when using PTP.
When running a PTP daemon, XLIO periodically retrieves kernel time-conversion parameters and combines them with NIC hardware timestamps to deliver synchronized time.
Prerequisites
A NIC supporting hardware clock.
Set:
XLIO_HW_TS_CONVERSION=4
Usage
Enable hardware RX timestamps:
uint8_t val = SOF_TIMESTAMPING_RX_HARDWARE
setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));
Then run the application with:
XLIO_HW_TS_CONVERSION=4
Example:
Serve
$ sudo LD_PRELOAD=libxlio.so XLIO_HW_TS_CONVERSION=4 ./timestamping <iface> SOF_TIMESTAMPING_RAW_HARDWARE SOF_TIMESTAMPING_RX_HARDWARE
Client
$ LD_PRELOAD=libxlio.so sockperf tp -i <server-ip> -t 3600 -p 6666 --mps 10
timestamping output:
SOL_SOCKET SO_TIMESTAMPING SW 0.000000000 HW raw 1497823023.070846953 IP_PKTINFO interface index 8
SOL_SOCKET SO_TIMESTAMPING SW 0.000000000 HW raw 1497823023.170847260 IP_PKTINFO interface index 8
SOL_SOCKET SO_TIMESTAMPING SW 0.000000000 HW raw 1497823023.270847093 IP_PKTINFO interface index 8
PCI transactions between system RAM and the NIC introduce ~300 ns of latency per transfer, increasing with buffer size. Reducing the number of PCI fetches on the send path improves application egress latency.
XLIO optimizes this by:
Copying the WQE directly into the doorbell.
Inlining packets smaller than 190 bytes into the WQE, eliminating an additional PCI gather.
Storing larger packets in on-device memory instead of system RAM, reducing PCI traffic.
On-device memory is fully managed by XLIO and transparent to the application.
Total available device memory:
256 KB on single-port NICs
128 KB on dual-port NICs
You can configure the per-TX-ring allocation using:
XLIO_RING_DEV_MEM_TX=<bytes>
Prerequisites
Supported NIC:
NVIDIA ConnectX-6 Dx, ConnectX-7, BlueField-2, BlueField-3, or newer
Protocol: Ethernet
Environment variable:
XLIO_RING_DEV_MEM_TXset according to application needs
Usage
Configure XLIO with an appropriate XLIO_RING_DEV_MEM_TX value to define how much on-device memory each TX ring can use.
For example:
LD_PRELOAD=libxlio.so XLIO_RING_DEV_MEM_TX=16384 <application>
No application code changes are required - XLIO manages the memory internally.
Verifying Hardware Capability
Run XLIO with DEBUG trace level:
XLIO_TRACELEVEL=DEBUG LD_PRELOAD=<path to libxlio.so> <command line>
Look for a positive on_device_memory value in the device capabilities printout:
For example:
on_device_memory: 131072
This confirms that the NIC supports on-device memory and reports the size in bytes.
Monitoring
Use xlio_stats to display runtime on-device memory usage:
xlio_stats –p <pid> -v 3
Example:
======================================================
RING_ETH=[0]
Tx Offload: 858931 / 3402875 [kilobytes/packets]
Rx Offload: 865251 / 3402874 [kilobytes/packets]
Dev Mem Alloc: 16384
Dev Mem Stats: 739074 / 1784935 / 0 [kilobytes/packets/oob]
======================================================
Dev Mem Alloc - allocated device memory for this ring
Dev Mem Stats - usage counters: kilobytes / packets / out-of-buffer (oob)
To enable TCP_QUICKACK thresholding:
Modify
TCP_QUICKACK_THRESHOLDinlwip/opt.h.Recompile XLIO.
When TCP_QUICKACK is enabled, ACKs are sent immediately. This can delay processing of incoming packets.
The threshold disables QUICKACK for packets larger than the defined size.
It is effective only when QUICKACK is active via setsockopt() or XLIO_TCP_QUICKACK.
Disabled by default.
The XLIO daemon monitors TCP socket state for XLIO processes that exit ungracefully.
Usage instructions are in Installing XLIO via Dedicated Packages (Binary).
XLIO can accelerate NGINX by using hardware-offloaded IO paths.
Prerequisites
For kTLS usage, see: Supported Ciphers section.
Limitations
XLIO does not support running in daemon mode.
daemon off; must remain set.
NGINX Best Practices
See NGINX Appendix.
Configuration
In nginx.conf:
worker_processes <NUM-WORKERS>;
daemon off;
XLIO configuration:
XLIO_NGINX_WORKERS_NUM=<NUM-WORKERS>
XLIO_SPEC=<SPEC>
x86:
XLIO_SPEC=nginxBlueField DPU:
XLIO_SPEC=nginx_dpu