What can I help you with?
NVIDIA Accelerated IO (XLIO) Documentation Rev 3.50.3

Advanced Features

DPCP provides a unified, flexible interface for programming NVIDIA NICs. DPCP is a prerequisite for enabling XLIO features such as LRO offload, Striding-RQ and TLS HW offload.

XLIO, which comes as part of DOCA-Host, uses DPCP. Please check the Important Notes under Changes and New Features for the minimal DPCP version required.

DPCP is an open source project - see its repository here.

TLS HW offload feature accelerates TLS encryption/decryption.

Prerequisites

  • Please refer to System Requirements

  • The card must be crypto enabled based on supported cards

  • Linux distribution with kTLS support

    • To use LIBXLIO with TLS 1.3 hardware offload, your OS must support kTLS 1.3.

  • Application or TLS library with kTLS support

  • OpenSSL

    • OpenSSL library for symmetric encryption SW fallback.

    • XLIO TLS1.2/1.3 Tx offload requires OpenSSL 3.0.0

    • XLIO TLS1.2 Rx offload requires OpenSSL 3.0.2

    • XLIO TLS1.3 Rx offload requires OpenSSL 3.2.0

    • Setenable-ktls in the configuration when building OpenSSL.

Usage

Enable kTLS in your application the same way you would when relying on kernel TLS support.

XLIO provides its own configuration parameters to control kTLS offload: XLIO_UTLS_TX (enabled by default) and XLIO_UTLS_RX (disabled by default).

  • Ensure to enable XLIO_UTLS_RX if you need KTLS on RX as well.

Note: If TLS HW offload cannot be provided setsockopt() syscall returns an error with errno=ENOPROTOOPT.

Monitoring

TLS HW offload feature adds new statistics counters. Their presence indicate that offload is configured and works. xlio_stats tool with option -v3 shows TLS statistics for TCP sockets and Rings:

Copy
Copied!
            

====================================================== Fd=[59] - TCP, Non-blocked - Local Address = [14.212.1.34:443] - Foreign Address = [14.212.1.57:49072] Tx Offload: 18511 / 39409 / 0 / 0 [kilobytes/packets/eagains/errors] Rx Offload: 1045354 / 2210387 / 0 / 1 [kilobytes/packets/eagains/errors] Rx byte: cur 0 / max 313 / dropped 0 / limit 0 Rx pkt : cur 0 / max 1 / dropped 0 TLS Offload: version 0303 / cipher 51 / TX On / RX On TLS Tx Offload: 17394 / 39407 [kilobytes/records] TLS Rx Offload: 982755 / 2210381 / 28 / 0 [kilobytes/records/encrypted/mixed] TLS Rx Resyncs: 1 [total] ====================================================== RING_ETH=[0] Tx Offload: 18519 / 39559 [kilobytes/packets] Rx Offload: 5080 / 39419 [kilobytes/packets] TLS TX Context Setups: 1 TLS RX Context Setups: 1 Interrupts: 39324 / 38656 [requests/received] Moderation: 1024 / 1024 [frames/usec period] ======================================================

Description of the statistics counters:

  • TLS Offload (version) - 0303 for TLS1.2 and 0304 for TLS1.3.

  • TLS Offload (cipher) - 51 for AES128-GCM and 52 for AES256-GCM.

  • TLS Offload (TX|RX) - On|Off values turn TLS transmit(TX) and receive(RX) On or Off.

  • TLS Tx Offload (kilobytes) – number of offloaded kilobytes excluding headers and other TLS record overhead.

  • TLS Tx Offload (records) – number of created and queued TLS records.

  • TLS Tx Resyncs – number of HW resynchronizations due to out of sequence send operations.

  • TLS Rx Offload (kilobytes) - number of bytes received as TLS payload.

  • TLS Rx Offload (records) - total number of TLS records received on the socket.

  • TLS Rx Offload (encrypted) - number of encrypted TLS records were decrypted in SW by XLIO.

  • TLS Rx Offload (mixed) - number of partially decrypted TLS records handled by XLIO.

  • TLS Rx Resyncs – number of times HW loses synchronization.

  • TLS TX Context Setups – accumulative counter of created TLS TX contexts what equals to the summary number of sockets with configured TLS TX offload.

  • TLS RX Context Setups – accumulative counter of created TLS RX contexts what equals to the summary number of sockets with configured TLS RX offload.

Note: TLS kernel counters do not increment when the application is offloaded with LIBXLIO

Supported Ciphers

The below table lists all the supported offloaded ciphers.

TLS Version

Bits

Hardware Offload

OpenSSL Name

XLIO Support

TX

RX

1.2

128

TLS1.2-AES128-GCM

AES128-GCM-SHA256

YES

YES

ECDHE-ECDSA-AES128-GCM-SHA256

YES

YES

ECDHE-RSA-AES128-GCM-SHA256

YES

YES

256

TLS1.2-AES256-GCM

AES256-GCM-SHA384

YES1

YES1

ECDHE-ECDSA-AES256-GCM-SHA384

YES1

YES1

ECDHE-RSA-AES256-GCM-SHA384

YES1

YES1

1.3

128

TLS1.3-AES128-GCM

TLS_AES_128_GCM_SHA256

YES1

YES1

256

TLS1.3-AES256-GCM

TLS_AES_256_GCM_SHA384

YES1

YES1

  1. Not supported by RHEL v8.3 yet.

XLIO supports hardware timestamping for UDP-RX flow (only) with Precision Time Protocol (PTP).

When using XLIO on a server running a PTP daemon, XLIO can periodically query the kernel to obtain updated time conversion parameters which it uses in conjunction with the hardware time-stamp it receives from the NIC to provide synchronized time.

Prerequisites

  • Support devices: NIC clock

  • Set XLIO_HW_TS_CONVERSION environment variable to 4

Usage

  1. Set the SO_TIMESTAMPING option for the socket with value SOF_TIMESTAMPING_RX_HARDWARE:

    Copy
    Copied!
                

    uint8_t val = SOF_TIMESTAMPING_RX_HARDWARE setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));

  2. Set XLIO environment parameter XLIO_HW_TS_CONVERSION to 4.

    Example:

    Use the Linux kernel (v4.11) timestamping example found in the kernel source at: tools/testing/selftests/networking/timestamping/timestamping.c.

    Copy
    Copied!
                

    Serve $ sudo LD_PRELOAD=libxlio.so XLIO_HW_TS_CONVERSION=4 ./timestamping <iface> SOF_TIMESTAMPING_RAW_HARDWARE SOF_TIMESTAMPING_RX_HARDWARE Client $ LD_PRELOAD=libxlio.so sockperf tp -i <server-ip> -t 3600 -p 6666 --mps 10 timestamping output: SOL_SOCKET SO_TIMESTAMPING SW 0.000000000 HW raw 1497823023.070846953 IP_PKTINFO interface index 8 SOL_SOCKET SO_TIMESTAMPING SW 0.000000000 HW raw 1497823023.170847260 IP_PKTINFO interface index 8 SOL_SOCKET SO_TIMESTAMPING SW 0.000000000 HW raw 1497823023.270847093 IP_PKTINFO interface index 8

Each PCI transaction between the system’s RAM and NIC starts at ~300 nsec and increases depending on the buffer size. Application egress latency can be improved by reducing the number of PCI transitions on the send path as much as possible.

Today, XLIO achieves these goals by copying the WQE into the doorbell, and for small packets (<190 Bytes payload) XLIO can inline the packet into the WQE and reduce the data gather PCI transition as well. For data sizes above 190 bytes, an additional PCI gather cycle by the NIC is required to pull the data buffer for egress.

XLIO uses the on-device-memory to store the egress packet if it does not fit into the BF inline buffer. The on-device-memory is a resource managed by XLIO and it is transparent to the user. The total size of the on-device-memory is limited to 256k for a single-port NIC and to 128k for dual-port NIC. Using XLIO_RING_DEV_MEM_TX, the user can set the amount of on-device-memory buffer allocated for each TX ring.

Prerequisites

  • NIC: NVIDIA ConnectX®-6 Dx/ConnectX-7/BlueField-2/BlueField-3 and above

  • Protocol: Ethernet

  • Set XLIO_RING_DEV_MEM_TX environment variable to best suit the application's requirements

Verifying On-Device Memory Capability in the Hardware

To verify “On Device Memory” capability in the hardware, run XLIO with DEBUG trace level:

Copy
Copied!
            

XLIO_TRACELEVEL=DEBUG LD_PRELOAD=<path to libxlio.so> <command line>

Look in the printout for a positive value of on-device-memory bytes.

For example:

Copy
Copied!
            

Pid: 1748924 Tid: 1748924 XLIO DEBUG : ibch[0x5633333c62f0]:229:print_val() mlx5_2: port(s): 1 vendor: 4125 fw: 22.31.1034 max_qp_wr: 32768 on_device_memory: 131072 packet_pacing_caps: min rate 1, max rate 100000000 Pid: 1748924 Tid: 1748924 XLIO DEBUG : ibch[0x56333340fa60]:229:print_val() mlx5_3: port(s): 1 vendor: 4125 fw: 22.31.1034 max_qp_wr: 32768 on_device_memory: 131072 packet_pacing_caps: min rate 1, max rate 100000000

To show and monitor On-Device Memory statistics, run xlio_stats tool.

Copy
Copied!
            

xlio_stats –p <pid> -v 3

For example:

Copy
Copied!
            

====================================================== RING_ETH=[0] Tx Offload: 858931 / 3402875 [kilobytes/packets] Rx Offload: 865251 / 3402874 [kilobytes/packets] Dev Mem Alloc: 16384 Dev Mem Stats: 739074 / 1784935 / 0 [kilobytes/packets/oob] ======================================================


Note

In order to enable TCP_QUICKACK threshold, the user should modify TCP_QUICKACK_THRESHOLD parameter in the lwip/opt.h file and recompile XLIO.

While TCP_QUICKACK option is enabled, TCP acknowledgments are sent immediately, rather than being delayed in accordance to a normal TCP receive operation. However, sending the TCP acknowledge delays the incoming packet processing to after the acknowledgement has been completed which can affect performance.

TCP_QUICKACK threshold enables the user to disable the quick acknowledgement for payloads that are larger than the threshold. The threshold is effective only when TCP_QUICKACK is enabled, using setsockopt() or using XLIO_TCP_QUICKACK parameter. TCP_QUICKACK threshold is disabled by default.

XLIO daemon is responsible for managing all traffic control logic of all XLIO processes, including qdisc, u32 table hashing, adding filters, removing filters, removing filters when the application crashes.

For XLIO daemon usage instructions, refer to the Installing the XLIO Binary Package section in the Installation Guide.

To show and monitor TAP statistics, run the xlio_stats tool:

Copy
Copied!
            

xlio_stats –p <pid> -v 3

Example:

Copy
Copied!
            

====================================================== RING_TAP=[0] Master: 0x29e4260 Tx Offload: 4463 / 67209 [kilobytes/packets] Rx Offload: 5977 / 90013 [kilobytes/packets] Rx Buffers: 256 VF Plugouts: 1 Tap fd: 21 Tap Device: td34f15 ====================================================== RING_ETH=[1] Master: 0x29e4260 Tx Offload: 7527 / 113349 [kilobytes/packets] Rx Offload: 7527 / 113349 [kilobytes/packets] Retransmissions: 1 ======================================================

Output analysis:

  • RING_TAP[0] and RING_ETH[1] have the same master 0x29e4260 ring

  • 4463 Kbytes/67209 packets were sent from the TAP device

  • 5977 Kbytes/90013 packets were received from the TAP device

  • Plugout event occurred once

  • TAP device fd number was 21, TAP name was td34f15

Pass special structure as an argument into getsockopt() with SO_XLIO_PD to get protection domain information from ring used for current socket. This information can be available after setting connection for TX ring and bounding to device for RX ring. By default getting PD for TX ring. This case can be used with sendmsg(SCM_XLIO_PD) when the data portion contains an array of the elements with datatype as struct xlio_pd_key. Number of elements in this array should be equal to msg_iovlen value. Every data pointer in msg_iov has correspondent memory key.

Copy
Copied!
            

struct xlio_pd_attr { uint32_t flags; void* ib_pd; };

NVME over TCP (NVMEoTCP) hardware offload feature accelerates NVMEoTCP DIGEST calculation for transmitted NVME PDUs.

Prerequisites

  • Please refer to System Requirements

  • The card must support NVMEoTCP offload (ConnectX-7, Bluefield-3)

Usage

  1. Use the application to call setsockopt (fd, IPPROTO_TCP, TCP_ULP, "nvme", 4)

  2. Call: setsockopt(fd, NVDA_NVME, NVME_TX, &configure, sizeof(configure))where: uint32_t configure = XLIO_NVME_HDGST_ENABLE | XLIO_NVME_DDGST_ENABLE | XLIO_NVME_DDGST_OFFLOAD

    Note: If any of the setsockopt calls fail, offload is not supported.

  3. Call: setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &opt_val, sizeof(opt_val))

    where: int opt_val = 1

  4. Use the application to register TX buffers with PD from XLIO - see section Getting Ring Protection Domain section above

  5. Call sendmsg(fd, msg, MSG_ZEROCOPY) with extended zero-copy API

    where msg is of type msghdr

    1. With cmsghdr *cmsg = CMSG_FIRSTHDR(msg);

    2. cmsg->cmsg_level = SOL_SOCKET;

    3. cmsg->cmsg_type = SCM_XLIO_NVME_PD;

    4. cmsg->cmsg_len = msg->msg_controllen;

    5. With CMSG_DATA(cmsg) set to xlio_pd_key

See Kernel TX Zero Copy documentation: https://www.kernel.org/doc/html/v4.15/networking/msg_zerocopy.html

Full examples can be found in XLIO GIT repository: https://github.com/Mellanox/libxlio

Offloaded TCP sockets support SO_XLIO_ISOLATE option on SOL_SOCKET level. The option allows grouping sockets with specific policy. Value for the option has type ‘int’ and contains the policy.

Supported policies:

  • SO_XLIO_ISOLATE_DEFAULT – default behavior according to XLIO configuration.

  • SO_XLIO_ISOLATE_SAFE – isolate sockets from the default sockets and guarantee thread safety regardless of XLIO configuration. This policy is effective in XLIO_TCP_CTL_THREAD=delegate configuration. Socket API thread safety model is not changed.

Limitations:

  • SO_XLIO_ISOLATE option may be called after socket() syscall and before either listen() or connect().

Accelerate and enhance the performance of your NGINX webserver using NVIDIA Accelerated IO (XLIO).

XLIO optimizes data transfers and significantly reduces latency by leveraging advanced hardware acceleration capabilities.

Prerequisites

  • for KTLS usage, please check out “Advanced Features” →” TLS HW Offload” →” Supported Ciphers”.

Limitations

  • XLIO does not support running in daemon mode. Ensure daemon off; remains set.

Nginx best practices

Usage

NGINX Configuration

Ensure these settings in your global configuration block (nginx.conf):

Copy
Copied!
            

worker_processes <NUM-WORKERS>; # this directive needs to be coherent with XLIO_NGINX_WORKERS_NUM daemon off; # XLIO doesn't support daemon on. Keep it off.


XLIO configuration

  • XLIO_NGINX_WORKERS_NUM=<NUM-WORKERS> should be coherent with worker_processes in nginx.conf.

  • XLIO_SPEC=<SPEC>

    • For X86 Platforms - XLIO_SPEC=nginx

    • for BlueField DPU - XLIO_spec=nginx_dpu

© Copyright 2025, NVIDIA. Last updated on Jun 9, 2025.