What can I help you with?
DOCA Documentation v2.9.1 LTS

NVIDIA DOCA IP Fragmentation Application Guide

This document provides a IP Fragmentation implementation on top of the NVIDIA® BlueField® DPU.

This IP Fragmentation application is designed to handle IP fragmentation and reassembly efficiently, ensuring minimal processing overhead for non-fragmented packets while maintaining high performance for fragmented packets.

The application operates on a multi-core architecture, uses Receive Side Scaling (RSS) to distribute traffic, and supports configurable modes for flexible port configurations.

Key Features:

  • IP Reassembly:

    • Functionality: The application assembles fragmented packets received on input ports based on their fragmentation headers.

    • Workflow: Upon successful reassembly, the complete packets are forwarded to their destination port.

  • IP Fragmentation:

    • Functionality: Packets exceeding a configurable Maximum Transmission Unit (MTU) are fragmented into smaller packets.

    • Workflow: Fragments are generated with correct headers and forwarded while maintaining efficient resource utilization.

  • Transparent Forwarding: Packets that are neither fragmented nor require reassembly are forwarded directly without additional processing overhead.

  • Inner and Outer Fragmentation Handling: The application supports handling fragmentation at both inner (e.g., encapsulated traffic like GRE, VXLAN) and outer IP layers.

  • Performance Optimization:

    • Designed for high throughput using multi-core processing.

    • Utilizes RSS to distribute traffic across multiple cores, ensuring efficient CPU utilization and scalability.

  • Debuggability with Counters.

  • Dual Operating Modes:

    • Mode 1 (Two Ports): Forwarding between two ports (e.g., Port A ↔ Port B).

    • Mode 2 (Four Ports): Forwarding between Port A and Port B and between Port C and Port D (e.g., Port A ↔ Port B, Port C ↔ Port D), enabling simultaneous independent operations on two traffic streams.

The IP Fragmentation application client can either runs on the DPU serving as an underlying service for host applications.

Supported Modes:

Dual Port Mode (Bidirectional): Traffic flows bidirectionally between two ports.

Dual_Port_Mode-version-21-modificationdate-1737994184390-api-v2.png

Quad Port Mode (Multiport): Independent unidirectional forwarding from Port A ↔ Port B and Port C ↔ Port D.

In this mode, the direction of the traffic is isolated to go through two ports each time.

Quad_Port_Mode-version-19-modificationdate-1737994182917-api-v2.png

Notes:

  1. Both diagrams illustrate the flow for a single direction; however, the application operates bidirectionally.

  2. In both modes, non-fragmented or valid-sized packets follow the same flow path without additional actions.

The IP Fragmentation application runs on top of the DOCA API to send and receive packets.

Operational Workflow

  • Packet Reception and Classification:

    • Traffic is received on the input ports, with RSS distributing flows to available cores.

    • Packets are classified into three categories:

      • Fragmented (Needs Reassembly)

      • Too Large (Needs Fragmentation)

      • Standard Packets (Direct Forwarding)

  • Reassembly:

    • Fragments are buffered and reassembled using a configurable timeout.

    • Once reassembled, the full packet is validated and forwarded.

  • Fragmentation:

    • Large packets exceeding the MTU are fragmented.

    • Fragments are prepared with correct headers, sequence numbers, and size.

  • Direct Forwarding:

    • Standard packets are forwarded with minimal processing

Performance and Scalability

  • Multi-Core Processing:

    The application scales horizontally with the number of CPU cores, with each core handling a subset of traffic flows.

  • RSS Traffic Distribution:

    Receive Side Scaling ensures optimal load balancing across cores.

  • Minimal Overhead:

    Processing logic is optimized for low-latency handling of standard packets while ensuring efficient fragmentation and reassembly operations.

Debugging and Monitoring

Application provides real-time counters for insights for:

  • Packets processed.

  • Fragments reassembled or fragmented.

  • Errors such as timeout on incomplete fragments.

This application leverages the following DOCA libraries:

For additional information about the used DOCA libraries, please refer to the respective programming guides.

  • NVIDIA BlueField-3 DPU is required.

  • Ubuntu 18.04/20.04/22.04 hosts (x86)

  • Open MPI version 4.1.5rc2 or greater (included in DOCA's installation)

Info

Please refer to the DOCA Installation Guide for Linux for details on how to install BlueField-related software.

The installation of DOCA's reference applications contains the sources of the applications, alongside the matching compilation instructions. This allows for compiling the applications "as-is" and provides the ability to modify the sources, then compile a new version of the application.

Tip

For more information about the applications as well as development and compilation tips, refer to the DOCA Reference Applications page.

The sources of the application can be found under the application's directory: /opt/mellanox/doca/applications/frag/.

Compiling All Applications

All DOCA applications are defined under a single meson project. So, by default, the compilation includes all of them.

To build all the applications together, run:

Copy
Copied!
            

cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build

Info

doca_ip_frag is created under /tmp/build/ip_frag/.


Compiling Only the Current Application

  1. To directly build only the IP fragmentation application:

    Copy
    Copied!
                

    cd /opt/mellanox/doca/applications/ meson /tmp/build -Denable_all_applications=false -Denable_ip_frag=true ninja -C /tmp/build

    Info

    doca_ip_frag is created under /tmp/build/ip_frag/.

  2. Alternatively, one can set the desired flags in the meson_options.txt file instead of providing them in the compilation command line:

    1. Edit the following flags in /opt/mellanox/doca/applications/meson_options.txt:

      • Set enable_all_applications to false

      • Set enable_frag to true

    2. The same compilation commands should be used, as were shown in the previous section:

      Copy
      Copied!
                  

      cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build

      Info

      doca_ip_frag is created under /tmp/build/ip_frag/.

Troubleshooting

Please refer to the DOCA Troubleshooting for any issue you may encounter with the compilation of the DOCA applications.

Prerequisites

  • The Fragmentation application is based on DOCA Flow. Therefore, the user is required to allocate huge pages.

    Copy
    Copied!
                

    $ echo '4096' | sudo tee -a /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages $ sudo mkdir /mnt/huge $ sudo mount -t hugetlbfs -o pagesize=2M nodev /mnt/huge

  • FLEX profile number should be manually set to 3 on the system to enable GTP matching:

Copy
Copied!
            

$ sudo mlxconfig -d <pcie_address> s FLEX_PARSER_PROFILE_ENABLE=3


Application Execution

The Fragmentation application is provided in source form, hence a compilation is required before the application can be executed.

  1. Application usage instructions:

    Copy
    Copied!
                

    Usage: doca_ip_frag [DPDK Flags] -- [DOCA Flags] [Program Flags]   DOCA Flags: -h, --help Print a help synopsis -v, --version Print program version information -l, --log-level Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> --sdk-log-level Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> -j, --json <path> Parse all command flags from an input json file   Program Flags: -m, --mode Ip_frag application mode. Bidirectional mode forwards packets between a single reassembly port and a single fragmentation port. Multiport mode forwards packets between two pairs of reassembly and fragmentation ports. For more information consult DOCA IP Fragmentation Application Guide. Format: bidir, multiport -u, --mtu MTU size -t, --frag-aging-timeout Aging timeout of fragments pending packet reassembly in the fragmentation table (in ms) -s, --frag-tbl-size Frag table size, i.e. maximum amount of concurrent defragmentation contexts per worker thread -c, --mbuf-chain Enable mbuf chaining

    For additional information, please refer to the Command Line Flags section below.

    Note

    The above usage printout can be printed to the command line using the -h (or --help) options:

    Copy
    Copied!
                

    /tmp/build/ip_frag/doca_ip_frag -- -h

  2. CLI example for running the application on BlueField:

    Copy
    Copied!
                

    /tmp/build/ip_frag/doca_ip_frag -a auxiliary:mlx5_core.sf.2,dv_flow_en=2,sft_en=1 -a auxiliary:mlx5_core.sf.4,dv_flow_en=2,sft_en=1 -a auxiliary:mlx5_core.sf.3,dv_flow_en=2,sft_en=1 -a auxiliary:mlx5_core.sf.5,dv_flow_en=2,sft_en=1 -l 3-15 -- -l 50 -m multiport

  3. CLI example for running the application on the host:

    Copy
    Copied!
                

    /tmp/build/ip_frag/doca_ip_frag -l 0-7 -a 0000:08:00.0,dv_flow_en=2 -a 0000:08:00.1,dv_flow_en=2 -- -l 60 -m bidir -t 1000

    Note

    The DOCA Comm Channel device PCI addresses (0000:08:00.0, 0000:08:00.1) should match the address of the desired PCI device.

  4. The application also supports a JSON-based deployment mode, in which all command-line arguments are provided through a JSON file:

    Copy
    Copied!
                

    /tmp/build/ip_frag/doca_ip_frag --json [json_file]

    For example:

    Copy
    Copied!
                

    /tmp/build/ip_frag/doca_ip_frag --json /opt/src/doca/applications/ip_frag/ip_frag_params.json

    Note

    Before execution, please ensure that the used JSON file contains the correct configuration parameters, and especially the desired PCI addresses needed for the deployment.

Command Line Flags

Flag Type

Short Flag

Long Flag/JSON Key

Description

JSON Content

DPDK flags

a

devices

Add devices to the allow list

Note

This is a mandatory flag.

Copy
Copied!
            

"devices": [ { "device": "pf", "id": "0000:08:00.0", "hws": true, }, { "device": "pf", "id": "0000:08:00.1", "hws": true, }, ],

l

core-list

List of cores to be used by the application data path

Note

This is a mandatory flag.

Copy
Copied!
            

"core-list": "0-1"

General flags

h

help

Prints a help synopsis

N/A

v

version

Prints program version information

N/A

l

log-level

Set the log level for the application:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70 (Requires compilation with Trace level support)

Copy
Copied!
            

"log-level": 60

N/A

sdk-log-level

Sets the log level for the program:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70

Copy
Copied!
            

"sdk-log-level": 40

j

json

Parse all command flags from an input json file

N/A

Program flags

m

mode

Execution mode: bidir, multiport

Note

This is a mandatory flag.

Copy
Copied!
            

"mode": "bidir"

u

mtu

MTU for fragmentation

Copy
Copied!
            

"mtu": 1518 

t

frag-aging-timeout

Fragmentation table aging timeout (in [ms])

Copy
Copied!
            

"frag-aging-timeout": 2

s

frag-tbl-size

Fragmentation table size

Copy
Copied!
            

"frag-tbl-size": 2048

c

mbuf-chain

Enable mbuf chaining support on packet reassembly

Copy
Copied!
            

"mbuf-chain": false

Refer to DOCA Arg Parser for more information regarding the supported flags and execution modes.

Troubleshooting

Please refer to the DOCA Troubleshooting for any issue you may encounter with the installation or execution of the DOCA applications.

  1. Parse application arguments.

    1. Initialize arg parser resources, register DOCA general and DPDK-specific parameters.

      Copy
      Copied!
                  

      doca_argp_init(); doca_argp_set_dpdk_program(dpdk_init);

    2. Register IP Fragmentation application parameters.

      Copy
      Copied!
                  

      ip_frag_register_params();

    3. Parse the arguments.

      Copy
      Copied!
                  

      doca_argp_start();

      1. Parse DPDK flags and invoke handler for calling the rte_eal_init() function.

      2. Parse app parameters.

    4. Application uses different amount of ports depending on the mode argument. Set config→nb_ports to all available DPDK ports obtained by calling rte_eth_dev_count_avail() function.

      Copy
      Copied!
                  

      ip_frag_dpdk_config_num_ports();

    5. Application uses a dedicated queue per-core and amount of data path cores is user-configurable with DPDK arguments. Initialize dpdk ports and queues with the DOCA helper function.

      Copy
      Copied!
                  

      dpdk_queues_and_ports_init();

      1. Initialize DPDK ports.

      2. Create mbuf pool using rte_pktmbuf_pool_create.

      3. Driver initialization – use rte_eth_dev_configureto configure the number of queues.

      4. Rx/Tx queue initialization – use rte_eth_rx_queue_setup and rte_eth_tx_queue_setup to initialize the queues.

      5. Start the port using rte_eth_dev_start.

  2. In order to support graceful shutdown (including printing statistics and useful debug data) register a signal handler that sets force_stop variable to terminate data path cores main loop.

    Copy
    Copied!
                

    signal(SIGINT, signal_handler); signal(SIGTERM, signal_handler);

  3. Call the function that implements all app-specific initialization.

    Copy
    Copied!
                

    ip_frag();

    1. Initialize DOCA Flow that is necessary for RSS.

      Copy
      Copied!
                  

      init_doca_flow();

    2. Reserve a mbuf flag with rte_mbuf_dynflag_register() for saving fragmentation state.

      Copy
      Copied!
                  

      ip_frag_mbuf_flags_init();

    3. Create a per-core mempool for resulting packet fragment indirect mbufs using rte_pktmbuf_pool_create().

      Copy
      Copied!
                  

      ip_frag_indirect_pool_init();

    4. Create per-core data with rte_calloc(), initialize auxiliary data structures rte_eth_dev_tx_buffer with rte_zmalloc_socket(), rte_eth_tx_buffer_init(), rte_eth_tx_buffer_set_err_callback() and rte_ip_frag_tbl with rte_ip_frag_table_create().

      Copy
      Copied!
                  

      ip_frag_wt_data_init();

    5. Initialize DOCA Flow ports.

      Copy
      Copied!
                  

      ip_init_doca_flow_ports();

    6. Create RSS pipes and entries using Toeplitz hash function over outer IPv4 header fields.

      Copy
      Copied!
                  

      ip_frag_rss_pipes_create();

      1. Create DOCA Flow pipe config.

        Copy
        Copied!
                    

        doca_flow_pipe_cfg_create(); set_flow_pipe_cfg(); doca_flow_pipe_cfg_set_domain(); doca_flow_pipe_cfg_set_nr_entries(); doca_flow_pipe_cfg_set_match();

      2. Create the RSS pipe.

        Copy
        Copied!
                    

        doca_flow_pipe_create();

      3. Add RSS pipe entry.

        Copy
        Copied!
                    

        doca_flow_pipe_add_entry();

      4. Process the entry completion.

        Copy
        Copied!
                    

        doca_flow_entries_process();

    7. Start the data path main function on each worker thread.

      Copy
      Copied!
                  

      rte_eal_mp_remote_launch();

    8. Worker thread main loop function forwards packets between sets of ports, fragmenting or reassembling them on IP layer depending on the mode.

      Copy
      Copied!
                  

      ip_frag_wt_thread_main();

      1. Packet fragmentation algorithm entry-point function.

        Copy
        Copied!
                    

        ip_frag_wt_fragment();

        1. Receive packet burst from rx port.

          Copy
          Copied!
                      

          rte_eth_rx_burst();

        2. Iterate over burst of packets, fragment packets larger than MTU, push all resulting packets to tx buffer.

          Copy
          Copied!
                      

          ip_frag_pkt{s}_fragment();

          1. Parse the packet, store pointers to the parsed packet headers in frag_conn_parser_ctx instance.

            Copy
            Copied!
                        

            ip_frag_wan_parse();

          2. Save L2 header of a packet pending fragmentation into eth_hdr_copy and adjust mbuf data pointer to point to IP header.

            Copy
            Copied!
                        

            memcpy(); rte_pktmbuf_adj();

          3. Fragment the packet.

            Copy
            Copied!
                        

            rte_ipv4_fragment_packet();

          4. Release the original packet mbuf.

            Copy
            Copied!
                        

            rte_pktmbuf_free();

          5. Fix IP header checksum of resulting fragments.

            Copy
            Copied!
                        

            ip_frag_ipv4_hdr_cksum();

          6. Prepend previously saved L2 header to the resulting fragments.

            Copy
            Copied!
                        

            rte_pktmbuf_prepend(); memcpy();

          7. Push packet(s) to tx buffer.

            Copy
            Copied!
                        

            rte_eth_tx_buffer();

        3. Send resulting packet tx buffer to tx port.

          Copy
          Copied!
                      

          rte_eth_tx_buffer_flush();

      2. Packet reassembly algorithm entry-point function.

        Copy
        Copied!
                    

        ip_frag_wt_reassemble();

        1. Receive packet burst from rx port.

          Copy
          Copied!
                      

          rte_eth_rx_burst();

        2. Iterate over burst of packets, save fragments into frag table for reassembly, push all resulting packets to tx buffer.

          Copy
          Copied!
                      

          ip_frag_pkt{s}_reassemble();

          1. Parse the packet, store pointers to the parsed packet headers in frag_conn_parser_ctx instance.

            Copy
            Copied!
                        

            ip_frag_pkt_parse();

          2. Parsing result code DOCA_ERROR_AGAIN indicates that the parser has encountered a IP fragment and that re-parsing is required after reassembling the packet. Push the fragment to the frag table for reassembly.

            Copy
            Copied!
                        

            ip_frag_pkt_reassemble_push();

            1. Call the function that prepares the fragment for reassembly by setting all necessary mbuf fields.

              Copy
              Copied!
                          

              ip_frag_pkt_reassemble_prepare();

            2. Push the packet.

              Copy
              Copied!
                          

              rte_ipv4_frag_reassemble_packet();

            3. If mbuf chaining is disabled, then flatten the resulting mbuf chain into a single mbuf.

              Copy
              Copied!
                          

              ip_frag_pkt_fixup();

            4. Push packet(s) to tx buffer.

              Copy
              Copied!
                          

              rte_eth_tx_buffer();

          3. Fix the reassembled packet by re-computing its IP checksums, setting UPD checksum to 0 and fixing all applicable 'length' fields.

            Copy
            Copied!
                        

            ip_frag_pkt_flatten();

        3. Put expired fragments from the fragmentation table into death row.

          Copy
          Copied!
                      

          rte_ip_frag_table_del_expired_entries();

        4. Free death row mbufs.

          Copy
          Copied!
                      

          rte_ip_frag_free_death_row();

        5. Send resulting packet tx buffer to tx port.

          Copy
          Copied!
                      

          rte_eth_tx_buffer_flush();

    9. Wait for worker threads to finish.

      Copy
      Copied!
                  

      rte_eal_mp_wait_lcore();

    10. Print statistics and debug data.

      Copy
      Copied!
                  

      ip_frag_debug_counters_print();

    11. Stop DOCA Flow ports.

      Copy
      Copied!
                  

      stop_doca_flow_ports();

    12. Cleanup per-core data.

      Copy
      Copied!
                  

      ip_frag_wt_data_cleanup();

    13. Destroy DOCA Flow.

      Copy
      Copied!
                  

      doca_flow_destroy();

  4. DPDK ports and queues destruction.

    Copy
    Copied!
                

    dpdk_queues_and_ports_fini();

  5. DPDK finish.

    Copy
    Copied!
                

    dpdk_fini();

  6. Arg parser destroy.

    Copy
    Copied!
                

    doca_argp_destroy()

  • /opt/mellanox/doca/applications/ip_frag/

  • /opt/mellanox/doca/applications/ip_frag/ip_frag_params.json

© Copyright 2025, NVIDIA. Last updated on Feb 10, 2025.