NVIDIA DOCA IP Fragmentation Application Guide
This document provides a IP Fragmentation implementation on top of the NVIDIA® BlueField® DPU.
This IP Fragmentation application is designed to handle IP fragmentation and reassembly efficiently, ensuring minimal processing overhead for non-fragmented packets while maintaining high performance for fragmented packets.
The application operates on a multi-core architecture, uses Receive Side Scaling (RSS) to distribute traffic, and supports configurable modes for flexible port configurations.
Key Features:
IP Reassembly:
Functionality: The application assembles fragmented packets received on input ports based on their fragmentation headers.
Workflow: Upon successful reassembly, the complete packets are forwarded to their destination port.
IP Fragmentation:
Functionality: Packets exceeding a configurable Maximum Transmission Unit (MTU) are fragmented into smaller packets.
Workflow: Fragments are generated with correct headers and forwarded while maintaining efficient resource utilization.
Transparent Forwarding: Packets that are neither fragmented nor require reassembly are forwarded directly without additional processing overhead.
Inner and Outer Fragmentation Handling: The application supports handling fragmentation at both inner (e.g., encapsulated traffic like GRE, VXLAN) and outer IP layers.
Performance Optimization:
Designed for high throughput using multi-core processing.
Utilizes RSS to distribute traffic across multiple cores, ensuring efficient CPU utilization and scalability.
Debuggability with Counters.
Dual Operating Modes:
Mode 1 (Two Ports): Forwarding between two ports (e.g., Port A ↔ Port B).
Mode 2 (Four Ports): Forwarding between Port A and Port B and between Port C and Port D (e.g., Port A ↔ Port B, Port C ↔ Port D), enabling simultaneous independent operations on two traffic streams.
The IP Fragmentation application client can either runs on the DPU serving as an underlying service for host applications.
Supported Modes:
Dual Port Mode (Bidirectional): Traffic flows bidirectionally between two ports.

Quad Port Mode (Multiport): Independent unidirectional forwarding from Port A ↔ Port B and Port C ↔ Port D.
In this mode, the direction of the traffic is isolated to go through two ports each time.

Notes:
Both diagrams illustrate the flow for a single direction; however, the application operates bidirectionally.
In both modes, non-fragmented or valid-sized packets follow the same flow path without additional actions.
The IP Fragmentation application runs on top of the DOCA API to send and receive packets.
Operational Workflow
Packet Reception and Classification:
Traffic is received on the input ports, with RSS distributing flows to available cores.
Packets are classified into three categories:
Fragmented (Needs Reassembly)
Too Large (Needs Fragmentation)
Standard Packets (Direct Forwarding)
Reassembly:
Fragments are buffered and reassembled using a configurable timeout.
Once reassembled, the full packet is validated and forwarded.
Fragmentation:
Large packets exceeding the MTU are fragmented.
Fragments are prepared with correct headers, sequence numbers, and size.
Direct Forwarding:
Standard packets are forwarded with minimal processing
Performance and Scalability
Multi-Core Processing:
The application scales horizontally with the number of CPU cores, with each core handling a subset of traffic flows.
RSS Traffic Distribution:
Receive Side Scaling ensures optimal load balancing across cores.
Minimal Overhead:
Processing logic is optimized for low-latency handling of standard packets while ensuring efficient fragmentation and reassembly operations.
Debugging and Monitoring
Application provides real-time counters for insights for:
Packets processed.
Fragments reassembled or fragmented.
Errors such as timeout on incomplete fragments.
This application leverages the following DOCA libraries:
For additional information about the used DOCA libraries, please refer to the respective programming guides.
NVIDIA BlueField-3 DPU is required.
Ubuntu 18.04/20.04/22.04 hosts (x86)
Open MPI version 4.1.5rc2 or greater (included in DOCA's installation)
Please refer to the DOCA Installation Guide for Linux for details on how to install BlueField-related software.
The installation of DOCA's reference applications contains the sources of the applications, alongside the matching compilation instructions. This allows for compiling the applications "as-is" and provides the ability to modify the sources, then compile a new version of the application.
For more information about the applications as well as development and compilation tips, refer to the DOCA Reference Applications page.
The sources of the application can be found under the application's directory: /opt/mellanox/doca/applications/frag/
.
Compiling All Applications
All DOCA applications are defined under a single meson project. So, by default, the compilation includes all of them.
To build all the applications together, run:
cd /opt/mellanox/doca/applications/
meson /tmp/build
ninja -C /tmp/build
doca_ip_frag
is created under /tmp/build/ip_frag/
.
Compiling Only the Current Application
To directly build only the IP fragmentation application:
cd /opt/mellanox/doca/applications/ meson /tmp/build -Denable_all_applications=
false
-Denable_ip_frag=true
ninja -C /tmp/buildInfodoca_ip_frag
is created under/tmp/build/ip_frag/
.Alternatively, one can set the desired flags in the
meson_options.txt
file instead of providing them in the compilation command line:Edit the following flags in
/opt/mellanox/doca/applications/meson_options.txt
:Set
enable_all_applications
tofalse
Set
enable_frag
totrue
The same compilation commands should be used, as were shown in the previous section:
cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build
Infodoca_ip_frag
is created under/tmp/build/ip_frag/
.
Troubleshooting
Please refer to the DOCA Troubleshooting for any issue you may encounter with the compilation of the DOCA applications.
Prerequisites
The Fragmentation application is based on DOCA Flow. Therefore, the user is required to allocate huge pages.
$ echo
'4096'
| sudo tee -a /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages $ sudo mkdir /mnt/huge $ sudo mount -t hugetlbfs -o pagesize=2M nodev /mnt/hugeFLEX profile number should be manually set to 3 on the system to enable GTP matching:
$ sudo
mlxconfig -d <pcie_address> s FLEX_PARSER_PROFILE_ENABLE=3
Application Execution
The Fragmentation application is provided in source form, hence a compilation is required before the application can be executed.
Application usage instructions:
Usage: doca_ip_frag [DPDK Flags] -- [DOCA Flags] [Program Flags] DOCA Flags: -h, --help Print a help synopsis -v, --version Print program version information -l, --log-level Set the (numeric) log level
for
the program <10
=DISABLE,20
=CRITICAL,30
=ERROR,40
=WARNING,50
=INFO,60
=DEBUG,70
=TRACE> --sdk-log-level Set the SDK (numeric) log levelfor
the program <10
=DISABLE,20
=CRITICAL,30
=ERROR,40
=WARNING,50
=INFO,60
=DEBUG,70
=TRACE> -j, --json <path> Parse all command flags from an input json file Program Flags: -m, --mode Ip_frag application mode. Bidirectional mode forwards packets between a single reassembly port and a single fragmentation port. Multiport mode forwards packets between two pairs of reassembly and fragmentation ports. For more information consult DOCA IP Fragmentation Application Guide. Format: bidir, multiport -u, --mtu MTU size -t, --frag-aging-timeout Aging timeout of fragments pending packet reassembly in the fragmentation table (in ms) -s, --frag-tbl-size Frag table size, i.e. maximum amount of concurrent defragmentation contexts per worker thread -c, --mbuf-chain Enable mbuf chainingFor additional information, please refer to the Command Line Flags section below.
NoteThe above usage printout can be printed to the command line using the
-h
(or--help
) options:/tmp/build/ip_frag/doca_ip_frag -- -h
CLI example for running the application on BlueField:
/tmp/build/ip_frag/doca_ip_frag -a auxiliary:mlx5_core.sf.
2
,dv_flow_en=2
,sft_en=1
-a auxiliary:mlx5_core.sf.4
,dv_flow_en=2
,sft_en=1
-a auxiliary:mlx5_core.sf.3
,dv_flow_en=2
,sft_en=1
-a auxiliary:mlx5_core.sf.5
,dv_flow_en=2
,sft_en=1
-l3
-15
-- -l50
-m multiportCLI example for running the application on the host:
/tmp/build/ip_frag/doca_ip_frag -l
0
-7
-a0000
:08
:00.0
,dv_flow_en=2
-a0000
:08
:00.1
,dv_flow_en=2
-- -l60
-m bidir -t1000
NoteThe DOCA Comm Channel device PCI addresses (0000:08:00.0, 0000:08:00.1) should match the address of the desired PCI device.
The application also supports a JSON-based deployment mode, in which all command-line arguments are provided through a JSON file:
/tmp/build/ip_frag/doca_ip_frag --json [json_file]
For example:
/tmp/build/ip_frag/doca_ip_frag --json /opt/src/doca/applications/ip_frag/ip_frag_params.json
NoteBefore execution, please ensure that the used JSON file contains the correct configuration parameters, and especially the desired PCI addresses needed for the deployment.
Command Line Flags
Flag Type | Short Flag | Long Flag/JSON Key | Description | JSON Content |
DPDK flags |
|
| Add devices to the allow list Note
This is a mandatory flag. |
|
|
| List of cores to be used by the application data path Note
This is a mandatory flag. |
| |
General flags |
|
| Prints a help synopsis | N/A |
|
| Prints program version information | N/A | |
|
| Set the log level for the application:
|
| |
N/A |
| Sets the log level for the program:
|
| |
|
| Parse all command flags from an input json file | N/A | |
Program flags |
|
| Execution mode: bidir, multiport Note
This is a mandatory flag. |
|
|
| MTU for fragmentation |
| |
|
| Fragmentation table aging timeout (in [ms]) |
| |
|
| Fragmentation table size |
| |
|
| Enable mbuf chaining support on packet reassembly |
|
Refer to DOCA Arg Parser for more information regarding the supported flags and execution modes.
Troubleshooting
Please refer to the DOCA Troubleshooting for any issue you may encounter with the installation or execution of the DOCA applications.
Parse application arguments.
Initialize arg parser resources, register DOCA general and DPDK-specific parameters.
doca_argp_init(); doca_argp_set_dpdk_program(dpdk_init);
Register IP Fragmentation application parameters.
ip_frag_register_params();
Parse the arguments.
doca_argp_start();
Parse DPDK flags and invoke handler for calling the
rte_eal_init()
function.Parse app parameters.
Application uses different amount of ports depending on the
mode
argument. Set config→nb_ports to all available DPDK ports obtained by callingrte_eth_dev_count_avail()
function.ip_frag_dpdk_config_num_ports();
Application uses a dedicated queue per-core and amount of data path cores is user-configurable with DPDK arguments. Initialize dpdk ports and queues with the DOCA helper function.
dpdk_queues_and_ports_init();
Initialize DPDK ports.
Create mbuf pool using
rte_pktmbuf_pool_create
.Driver initialization – use
rte_eth_dev_configure
to configure the number of queues.Rx/Tx queue initialization – use
rte_eth_rx_queue_setup
andrte_eth_tx_queue_setup
to initialize the queues.Start the port using
rte_eth_dev_start
.
In order to support graceful shutdown (including printing statistics and useful debug data) register a signal handler that sets
force_stop
variable to terminate data path cores main loop.signal(SIGINT, signal_handler); signal(SIGTERM, signal_handler);
Call the function that implements all app-specific initialization.
ip_frag();
Initialize DOCA Flow that is necessary for RSS.
init_doca_flow();
Reserve a mbuf flag with
rte_mbuf_dynflag_register()
for saving fragmentation state.ip_frag_mbuf_flags_init();
Create a per-core mempool for resulting packet fragment indirect mbufs using rte_pktmbuf_pool_create().
ip_frag_indirect_pool_init();
Create per-core data with
rte_calloc()
, initialize auxiliary data structuresrte_eth_dev_tx_buffer
withrte_zmalloc_socket()
,rte_eth_tx_buffer_init()
,rte_eth_tx_buffer_set_err_callback()
andrte_ip_frag_tbl
withrte_ip_frag_table_create()
.ip_frag_wt_data_init();
Initialize DOCA Flow ports.
ip_init_doca_flow_ports();
Create RSS pipes and entries using Toeplitz hash function over outer IPv4 header fields.
ip_frag_rss_pipes_create();
Create DOCA Flow pipe config.
doca_flow_pipe_cfg_create(); set_flow_pipe_cfg(); doca_flow_pipe_cfg_set_domain(); doca_flow_pipe_cfg_set_nr_entries(); doca_flow_pipe_cfg_set_match();
Create the RSS pipe.
doca_flow_pipe_create();
Add RSS pipe entry.
doca_flow_pipe_add_entry();
Process the entry completion.
doca_flow_entries_process();
Start the data path main function on each worker thread.
rte_eal_mp_remote_launch();
Worker thread main loop function forwards packets between sets of ports, fragmenting or reassembling them on IP layer depending on the mode.
ip_frag_wt_thread_main();
Packet fragmentation algorithm entry-point function.
ip_frag_wt_fragment();
Receive packet burst from rx port.
rte_eth_rx_burst();
Iterate over burst of packets, fragment packets larger than MTU, push all resulting packets to tx buffer.
ip_frag_pkt{s}_fragment();
Parse the packet, store pointers to the parsed packet headers in
frag_conn_parser_ctx
instance.ip_frag_wan_parse();
Save L2 header of a packet pending fragmentation into
eth_hdr_copy
and adjust mbuf data pointer to point to IP header.memcpy(); rte_pktmbuf_adj();
Fragment the packet.
rte_ipv4_fragment_packet();
Release the original packet mbuf.
rte_pktmbuf_free();
Fix IP header checksum of resulting fragments.
ip_frag_ipv4_hdr_cksum();
Prepend previously saved L2 header to the resulting fragments.
rte_pktmbuf_prepend(); memcpy();
Push packet(s) to tx buffer.
rte_eth_tx_buffer();
Send resulting packet tx buffer to tx port.
rte_eth_tx_buffer_flush();
Packet reassembly algorithm entry-point function.
ip_frag_wt_reassemble();
Receive packet burst from rx port.
rte_eth_rx_burst();
Iterate over burst of packets, save fragments into frag table for reassembly, push all resulting packets to tx buffer.
ip_frag_pkt{s}_reassemble();
Parse the packet, store pointers to the parsed packet headers in
frag_conn_parser_ctx
instance.ip_frag_pkt_parse();
Parsing result code
DOCA_ERROR_AGAIN
indicates that the parser has encountered a IP fragment and that re-parsing is required after reassembling the packet. Push the fragment to the frag table for reassembly.ip_frag_pkt_reassemble_push();
Call the function that prepares the fragment for reassembly by setting all necessary mbuf fields.
ip_frag_pkt_reassemble_prepare();
Push the packet.
rte_ipv4_frag_reassemble_packet();
If mbuf chaining is disabled, then flatten the resulting mbuf chain into a single mbuf.
ip_frag_pkt_fixup();
Push packet(s) to tx buffer.
rte_eth_tx_buffer();
Fix the reassembled packet by re-computing its IP checksums, setting UPD checksum to 0 and fixing all applicable 'length' fields.
ip_frag_pkt_flatten();
Put expired fragments from the fragmentation table into death row.
rte_ip_frag_table_del_expired_entries();
Free death row mbufs.
rte_ip_frag_free_death_row();
Send resulting packet tx buffer to tx port.
rte_eth_tx_buffer_flush();
Wait for worker threads to finish.
rte_eal_mp_wait_lcore();
Print statistics and debug data.
ip_frag_debug_counters_print();
Stop DOCA Flow ports.
stop_doca_flow_ports();
Cleanup per-core data.
ip_frag_wt_data_cleanup();
Destroy DOCA Flow.
doca_flow_destroy();
DPDK ports and queues destruction.
dpdk_queues_and_ports_fini();
DPDK finish.
dpdk_fini();
Arg parser destroy.
doca_argp_destroy()
/opt/mellanox/doca/applications/ip_frag/
/opt/mellanox/doca/applications/ip_frag/ip_frag_params.json