DOCA Documentation v3.2.0

DOCA Flow

This guide describes how to deploy the DOCA Flow library, the philosophy of the DOCA Flow API, and how to use it. The guide is intended for developers writing network function applications that focus on packet processing (such as gateways). It assumes familiarity with the network stack and DPDK.

Note

The quality status of DOCA libraries is listed here.

DOCA Flow is the most fundamental API for building generic packet processing pipes in hardware. The DOCA Flow library provides an API for building a set of pipes, where each pipe consists of match criteria, monitoring, and a set of actions. Pipes can be chained so that after a pipe-defined action is executed, the packet may proceed to another pipe.

Using DOCA Flow API, it is easy to develop hardware-accelerated applications that have a match on up to two layers of packets (tunneled).

  • MAC/VLAN/ETHERTYPE

  • IPv4/IPv6

  • TCP/UDP/ICMP

  • GRE/VXLAN/GTP-U/ESP/PSP

  • Metadata

The execution pipe can include packet modification actions such as the following:

  • Modify MAC address

  • Modify IP address

  • Modify L4 (ports)

  • Strip tunnel

  • Add tunnel

  • Set metadata

  • Encrypt/Decrypt

The execution pipe can also have monitoring actions such as the following:

  • Count

  • Policers

The pipe also has a forwarding target which can be any of the following:

  • Software (RSS to subset of queues)

  • Port

  • Another pipe

  • Drop packets

A DOCA Flow-based application can run on either the host machine or an NVIDIA® BlueField® DPU target.

These applications require the allocation of huge pages, which must be configured prior to execution. Use the following commands to allocate and mount 2MB huge pages:

Copy
Copied!
            

$ echo '1024' | sudo tee -a /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages $ sudo mkdir /mnt/huge $ sudo mount -t hugetlbfs -o pagesize=2M nodev /mnt/huge

The following diagram shows how the DOCA Flow library defines a pipe template, receives a packet for processing, creates the pipe entry, and offloads the flow rule in NIC hardware.

architecture-diagram-version-1-modificationdate-1761115074613-api-v2.png

Features of DOCA Flow:

  • User-defined set of matches parser and actions

  • DOCA Flow pipes can be created or destroyed dynamically

  • Packet processing is fully accelerated by hardware with a specific entry in a flow pipe

  • Packets that do not match any of the pipe entries in hardware can be sent to Arm cores for exception handling and then reinjected back to hardware

The DOCA Flow pipe consists of the following components:

  • Monitor (MON in the diagram) - counts, meters

  • Modify (MDF in the diagram) - modifies a field

  • Forward (FWD in the diagram) - forwards to the next stage in packet processing

DOCA Flow organizes pipes into high-level containers named domains to address the specific needs of the underlying architecture.

A key element in defining a domain is the packet direction and a set of allowed actions.

  • A domain is a pipe attribute (also relates to shared objects)

  • A domain restricts the set of allowed actions

  • Transition between domains is well-defined (packets cannot cross domains arbitrarily)

  • A domain may restrict the sharing of objects between packet directions

  • Packet direction can restrict the move between domains

List of Steering Domains

DOCA Flow provides the following set of predefined steering domains:

Domain

Description

DOCA_FLOW_PIPE_DOMAIN_DEFAULT

  • Default domain for actions on ingress traffic

  • Secure actions are not allowed here

  • The next milestone is queue or pipe in the EGRESS domain

  • Miss action is: Drop

DOCA_FLOW_PIPE_DOMAIN_SECURE_INGRESS

  • For secure actions on ingress traffic

  • Encapsulation and encrypting actions not allowed here

  • The only allowed domain for decrypting secure actions

  • The next milestone is queue or pipe in the DEFAULT or EGRESS domain

  • Only meta register is preserved

  • Miss action is: Drop

DOCA_FLOW_PIPE_DOMAIN_EGRESS

  • Domain for actions on egress traffic

  • Secure actions are not allowed here

  • The next milestone is wire/representor or pipe in SECURE_EGRESS domain

  • Miss action is: Send to wire/representor

DOCA_FLOW_PIPE_DOMAIN_SECURE_EGRESS

  • Domain for secure actions on egress traffic

  • Decapsulation actions are not allowed here

  • The only allowed domain for encrypting secure action

  • The next milestone is wire/representor

  • Miss action is: Send to wire/representor


Domains in VNF Mode

Picture3-version-1-modificationdate-1761115078133-api-v2.png

Domains in Switch Mode

Picture1-version-1-modificationdate-1761115078413-api-v2.png

Note

In switch mode, forwarding from a pipe with the default domain to the egress domain root pipe is allowed, while forwarding from the egress domain to the default domain is not allowed.

Note

Traffic from software Tx forwards to the egress root pipe.

Note

A pipe with RSS forward follows the above rules.

Note

Encap is suggested to be done from egress.


DOCA API is available through the DOCA API Reference.

Info

The pkg-config (*.pc file) for the DOCA Flow library is doca-flow.

Initialization Flow

Before using any DOCA Flow function, it is mandatory to call doca_flow_init(). This function initializes all resources required by the DOCA Flow library.

Pipe Mode

The pipe mode (set via mode_args) defines the fundamental traffic topology and default "miss" behavior for a DOCA port. When a port is initialized, this mode creates the initial miss rules.

DOCA supports the following modes:

VNF Mode

This mode is designed for Virtual Network Function (VNF) use cases. A packet arriving from one of the device's ports is processed and can be sent to another port. By default, any packet that does not match a user-defined pipe entry (a "miss") is forwarded to the host's RSS (Receive Side Scaling) queues.

Example scenario:

vnf-mode-diagram-version-1-modificationdate-1761115074270-api-v2.png

  • Packet1 arrives at P0, does not match any existing entry, and is forwarded to the host's RSS queues.

  • The application running on the host captures this packet, processes it, and creates a new pipe entry to handle this flow.

  • Packet2 (part of the same flow) arrives, hits the new pipe entry, and has an action applied (e.g., VXLAN decap, modify, encap) before being sent out from P1.

To forward packets between ports, you must use doca_flow_port_pair to establish a port pair. Port pairing is unidirectional.

  • Pairing Port X with Port Y allows forwarding from X to Y, but not from Y to X.

  • To enable bidirectional forwarding, you must call doca_flow_port_pair a second time to pair Y with X.

After pairing, packets can be forwarded to either the paired port directly or the paired port's egress root pipe.

Note

Forwarding from an ingress pipe to an egress pipe is only allowed to the egress_root pipe.


Switch Mode

This mode is used for internal switching and only allows representor ports (e.g., uplink representors and SF/VF representors). A packet is forwarded from one port to another. If a packet arrives from an uplink and does not match a user-defined pipe entry, it is received on all RSS queues of that uplink's representor.

vnf-mode-diagram-version-1-modificationdate-1761115074270-api-v2.png

  • Configuration requirements:

    • The doca_dev field must be set in the doca_flow_port_cfg (using doca_flow_port_cfg_set_dev()).

    • Isolated mode must be specified.

      Note

      The application must not initialize VF/SF representor ports using the DPDK API. Functions such as rte_eth_dev_configure(), rte_eth_rx_queue_setup(), and rte_eth_dev_start() must not be called for these ports.

  • Switch manager port: In switch mode, DOCA Flow unifies all ports under a single "switch manager port" for traffic management. This means all traffic is handled by this manager port. Applications only need to create an RSS pipe on the switch manager port to receive missed traffic and should manage all pipes on this port.

  • expert Argument (mode_args): Switch mode can be configured with or without the expert argument. This setting changes how the source port_id of missed traffic is identified:

    • Without expert (default): Missed traffic sent to the software is tagged with the source port_id in the mbuf CQE field. This allows the application to identify the packet's origin. To send a packet, the application can set the destination port_id in the mbuf meta, and the packet will be sent directly to that port.

    • With expert: The port_id is not added to the packet. This gives the user full control to configure pipes freely and implement a custom solution for port identification.

      image-2024-4-22_11-40-1-version-1-modificationdate-1761115077497-api-v2.png

      Note

      Due to a firmware limitation, port_id information is missing for traffic that is cloned from a VF to an RSS pipe.

This mode ( mode_args) defines the basic traffic in DOCA. It creates some miss rules when a DOCA port initializes. Currently, DOCA supports 3 modes:

  • vnf

    A packet arriving from one of the device's ports is processed, and can be sent to another port. By default, missed packets go to RSS.

    The following diagram shows the basic traffic flow in vnf mode. Packet1 firstly misses and is forwarded to host RSS. The app captures this packet and decides how to process it and then creates a pipe entry. Packet2 will hit this pipe entry and do the action, for example, for VXLAN, will do decap, modify, and encap, then is sent out from P1. vnf-mode-diagram-version-1-modificationdate-1761115074270-api-v22.png

    To forward packets between ports, use doca_flow_port_pair to establish a port pair. Port pairing is unidirectional, meaning that pairing port X with port Y allows forwarding from X to Y, but not the reverse. To enable forwarding from Y to X, you must pair Y with X using doca_flow_port_pair again. After pairing ports, packets can be forwarded to either the paired port directly or the paired port's egress root pipe, which sends packets through the egress routine.

    Note

    Forwarding from an ingress pipe to an egress pipe is only allowed to the egress_root pipe.

  • switch

    Used for internal switching, only representor ports are allowed, for example, uplink representors and SF/VF representors. Packet is forwarded from one port to another. If a packet arrives from an uplink and does not hit the rules defined by the user's pipe, then the packet is received on all RSS queues of the representor of the uplink.

    The following diagram shows the basic flow of traffic in switch mode. Packet1 firstly misses to host RSS queues. The app captures this packet and decides to which representor the packet goes, and then sets the rule. Packets hit this rule and go to representor0. switch-mode-diagram-version-1-modificationdate-1761115073987-api-v2.png

    doca_dev field is mandatory in doca_flow_port_cfg (using doca_flow_port_cfg_set_dev()) and isolated mode should be specified.

    Note

    The application must avoid initialization of the VF/SF representor ports in DPDK API (i.e., the following functions rte_eth_dev_configure(), rte_eth_rx_queue_setup(), rte_eth_dev_start() must not be called for VF/SF representor ports).

    DOCA Flow switch mode unifies all the ports to the switch manager port for traffic management. This means that all the traffic is handled by switch manager port. Users only must create an RSS pipe on the switch manager port to get the missed traffic, and they should only manage the pipes on the switch manager port. Switch mode can work with two different mode_args configurations: With or without expert. The way to retrieve the miss traffic source's port_id depends on this configuration:

    • If expert is not set, the traffic misses to software would be tagged with port_id information in the mbuf CQE field to allow users to deduce the source port_id. Meanwhile, users can set the destination port_id to mbuf meta and the packet is sent out directly to the destination port based on the meta information.

      image-2024-4-22_11-40-1-version-1-modificationdate-1761115077497-api-v2.png

    • If expert is set, the port_id is not added to the packet. Users can configure the pipes freely to implement their own solution.

      Note

      Traffic cloned from the VF to the RSS pipe misses its port_id information due to firmware limitation.

Start Point

DOCA Flow API serves as an abstraction layer API for network acceleration. The packet processing in-network function is described from ingress to egress and, therefore, a pipe must be attached to the origin port. Once a packet arrives to the ingress port, it starts the hardware execution as defined by the DOCA API.

Port identifiers

doca_flow_port is an opaque object, as the DOCA Flow API is designed to be independent of any specific packet I/O framework (e.g., DPDK).

To begin using a DOCA Flow port, the user must call doca_flow_port_start(). This function associates the application's underlying port with a DOCA Flow port.

During this setup, the user must assign a logical port ID using the doca_flow_port_cfg_set_port_id() API.

This port_id is later referenced in:

  • doca_flow_parser_meta.port_id - when matching packets received on a specific port

  • doca_flow_fwd.port_id - when forwarding packets to a specific port

For DPDK-based applications, the DPDK port_id can be used directly as the logical port identifier.

DOCA Device

A DOCA device must be specified during doca_flow_port_start() to provide the necessary hardware context for the port.

  • For physical, virtual, or subfunction devices (PF, VF, or SF), use the doca_flow_port_cfg_set_dev() API to associate a doca_dev object with the port.

  • For representor ports, use the doca_flow_port_cfg_set_dev_rep() API to associate a doca_dev_rep object instead.

Providing a doca_dev_rep is what explicitly indicates that the port is a representor.

Actions Memory

The use of DOCA Flow actions mandates that sufficient memory will be allocated for them, in advance, per port. This is done using the doca_flow_port_cfg_set_actions_mem_size() API function. The recommended way of getting precise information about the expected actions memory consumption is the following:

  1. During development, allocate a fixed high amount of actions memory, regardless of your actual pipeline.

  2. Once the steering pipeline is ready, examine the program's actual consumption using the DOCA Flow Tune tool.

  3. Using DOCA Flow Tune for calibration, update the parameter passed to the setter function.

Please note that in the hot upgrade use case, the total actions memory size set for all instances should not exceed the maximum resource limit of this hardware port.

Port Operation State

DOCA Flow ports can be initialized multiple times from different instances. Each instance prepares its pipeline, but only one actively receives port traffic at a time. The instance actively handling the port traffic depends on the operation state set by the doca_flow_port_cfg_set_operation_state() function:

  • DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE – The instance actively handles incoming and outgoing traffic

  • DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE_READY_TO_SWAP – The instance handles traffic actively when no other active instance is available

  • DOCA_FLOW_PORT_OPERATION_STATE_STANDBY – The instance handles traffic only when no active or active_ready_to_swap instance is available

  • DOCA_FLOW_PORT_OPERATION_STATE_UNCONNECTED – The instance does not handle traffic, regardless of the state of other instances

If the doca_flow_port_cfg_set_operation_state() function is not called, the default state DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE is applied.

Note

When a port is configured with a state that expects to handle traffic, it takes effect only after root pipes are created for this port.

When the active port is closed, either gracefully or due to a crash, the standby instance automatically becomes active without any action required.

The port operation state can be modified after the port is started using the doca_flow_port_operation_state_modify() function.

Use Case Examples

Hot Upgrade

This operation state mechanism allows upgrading the DOCA Flow program without losing any traffic.

To upgrade an existing DOCA Flow program with ports started in DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE state (Instance A):

  1. Open a new Instance B and start its ports in DOCA_FLOW_PORT_OPERATION_STATE_STANDBY state.

  2. Modify Instance A's ports from DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE to DOCA_FLOW_PORT_OPERATION_STATE_UNCONNECTED state. At this point, Instance B starts receiving traffic.

  3. Close Instance A.

  4. Open a new Instance C with DOCA_FLOW_PORT_OPERATION_STATE_UNCONNECTED state. Instance C is the upgraded version of Instance A.

  5. Create the entire pipeline for Instance C.

  6. Change Instance C's state from DOCA_FLOW_PORT_OPERATION_STATE_UNCONNECTED to DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE. At this point, Instance B stops receiving traffic and Instance C starts.

  7. Instance B can either be closed or kept as a backup should Instance C crash.

Swap Existing Instances

This mechanism also facilitates swapping two different DOCA Flow programs without losing any traffic.

To swap between two existing DOCA Flow programs with ports started in DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE and DOCA_FLOW_PORT_OPERATION_STATE_STANDBY states (Instance A and Instance B, respectively):

  1. Modify Instance A's ports from DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE to DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE_READY_TO_SWAP.

  2. Modify Instance B's ports from DOCA_FLOW_PORT_OPERATION_STATE_STANDBY to DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE. At this point, Instance B starts receiving traffic.

  3. Modify Instance A's ports from DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE_READY_TO_SWAP to DOCA_FLOW_PORT_OPERATION_STATE_STANDBY.

Limitations

  • Supported only in switch mode – the mode_args string must include "switch".

  • Only the switch port supports states; its representors are affected by its state. Starting a representor port or calling the modify function with a non-active operation state should fail.

  • Two instances cannot be in the same operation state simultaneously, except for DOCA_FLOW_PORT_OPERATION_STATE_UNCONNECTED. If a new instance initializes a port while another instance with the same operation state exists, a DOCA_ERROR_ALREADY_EXIST error is returned either during port startup or root pipe creation.

  • Due to an underlying driver limitation, each ASO object type (e.g., METER, COUNTER) is restricted to a maximum of 64 million (64M) objects per port.

Create Pipe and Pipe Entry

Pipe is a template that defines packet processing without adding any specific hardware rule. A pipe consists of a template that includes the following elements:

  • Match

  • Monitor

  • Actions

  • Forward

The following diagram illustrates a pipe structure.

pipe-illustration-version-1-modificationdate-1761115073370-api-v2.png

The creation phase allows the hardware to efficiently build the execution pipe. After the pipe is created, specific entries can be added. A subset of the pipe may be used (e.g., skipping the monitor completely, just using the counter, etc).

Pipe Matching or Action Applying

DOCA Flow allows defining criteria for matching on a packet or for taking actions on a matched packet by modifying it. The information defining these criteria is provided through the following pointers:

  • Match or action pointer – given at pipe or entry creation

  • Mask pointer – optionally given at pipe creation

Defining criteria for matching or actions on a packet can be done at the pipe level, where it applies to all packets of a pipe, or specified on a per entry basis, where each entry defines the operation on either the match, actions, or both.

In DOCA Flow terminology, when a field is identified as CHANGEABLE at pipe creation, this means that the actual criterion of the field is deferred to entry creation. Different entries can provide different criteria for a CHANGEABLE field.

A match or action field can be categorized, during pipe creation, as one of the following:

  • IGNORED – Ignored in either the match or action taking process

  • CHANGEABLE – When the actual behavior is deferred to the entry creation stage

  • SPECIFIC – Value is used as is in either match or action process

A mask field can either be provided, in which case it is called explicit matching, or action applying. If the mask pointer is NULL, we call it implicit matching or action applying. The following subsections provide the logic governing matching and action applying.

When a field value is specified as 0xffff it means that all the field's bits are set (e.g., for TTL it means 0xff and for IPv4 address it means 0xffffffff).

Note

Creating multiple entries with the same match criteria is not supported. Instead, use the update entry API to modify the existing entry.

Matching

Matching is the process of selecting packets based on their fields' values and steering them for further processing. Processing can either be further matching or actions applying.

matching-diagram-version-1-modificationdate-1761115070473-api-v2.PNG

The packet enters the green filter which modifies it by masking it with the value A. The output value, P&A, is then compared to the value B, and if they are equal, then that is a match.

The values of A and B are evaluated according to the values of the pipe configuration and entry configuration fields, according to the tables in sections "Implicit Matching" and "Explicit Matching".

Implicit Matching

Match Type

Pipe Match Value (V)

Pipe Match Mask (M)

Entry Match Value (E)

Filter (A)

Rule (B)

Ignore

0

NULL

N/A

0

0

Constant

0<V<0xffff

NULL

N/A

0xffff

V

Changeable (per entry)

0xffff

NULL

0≤E≤0xffff

0xffff

E


Explicit Matching

Match Type

Pipe Match Value (V)

Pipe Match Mask (M)

Entry Match Value (E)

Filter (A)

Rule (B)

Constant

V!=0xffff

0<M≤0xffff

0≤E≤0xffff

M

M&V

Changeable

V==0xffff

0<M≤0xffff

0≤E≤0xffff

M

M&E

Ignored

0≤V<0xffff

M==0

0≤E≤0xffff

0

0

Action Applying

Implicit Action Applying

Action Type

Pipe Action Value (V)

Pipe Action Mask (M)

Entry Action Value (E)

Action on the Field

Ignore

0

NULL

N/A

None

Constant

0 < V < 0xffff

NULL

N/A

Set to V

Changeable

0xffff

NULL

E

Set to E

Implicit action applying example:

  • Destination IPv4 address is 255.255.255.255

  • No mask provided

  • Entry value is 192.168.0.1

  • Result – The action field is changeable. Therefore, the value is provided by the entry. If a match on the packet occurs, the packet destination IPv4 address is changed to 192.168.0.1.

Note

Meta only supports changeable action.


Explicit Action Applying

Info

Assume P is packet's field value.

Action Type

Pipe Action Value (V)

Pipe Action Mask (M)

Entry Action Value (E)

Action on the Field

constant

V!=0xffff

0≤M≤0xffff

0≤E≤0xffff

set to (~M & P) | (M & V)

In words: modify only bits that are set on the mask to the values in V

Changeable

V==0xffff

0<M≤0xffff

0≤E≤0xffff

set to (~M & P) | (M & E)

Ignored

0≤V<0xffff

M==0

0≤E≤0xffff

none

Explicit action applying example:

  • Destination IPv4 address is 192.168.10.1

  • Mask is provided and equals 255.255.0.0

  • Entry value is ignored

  • Result – If a match on the packet occurs, the packet destination IPv4 value changes to 192.168.0.0

Setting Pipe Match or Action

Match is a mandatory parameter when creating a pipe. Using the doca_flow_match struct, users must define the packet fields to be matched by the pipe.

For each doca_flow_match field, users select whether the field type is:

  • Ignore (match any) – the value of the field is ignored in a packet. In other words, match on any value of the field.

  • Constant (specific) – all entries in the pipe have the same value for this field. Users should not put a value for each entry.

  • Changeable – the value of the field is defined per entry. Users must provide it upon adding an entry.

    Note

    L4 type, L3 type, and tunnel type cannot be changeable.

    Note

    gtp_next_ext_hdr_type supports only psc type (0x85).

The match field type can be defined either implicitly or explicitly using the doca_flow_pipe_cfg_set_match(struct doca_flow_pipe_cfg *cfg, const doca_flow_match *match, const doca_flow_match *match_mask) function. If match_mask == NULL, then it is done implicitly. Otherwise, it is explicit.

In the tables in the following subsections, an example is used of a 16-bit field (such as layer-4 destination port) where:

Note

The same concept would apply to any other field (such as an IP address occupying 32 bits).

  • P stands for the packet field value

  • V stands for the pipe match field value

  • M stands for the pipe mask field value

  • E stands for the match entry field value

Implicit Match

Match Type

Pipe Match Value (V)

Pipe Match Mask (M)

Entry Match Value (E)

Filter (A)

Rule (B)

Ignore

0

NULL

N/A

0

0

Constant

0<V<0xffff

NULL

N/A

0xffff

V

Changeable (per entry)

0xffff

NULL

0≤E≤0xffff

0xffff

E

To match implicitly, the following considerations should be taken into account.

  • Ignored fields:

    • Field is zeroed

    • Pipeline has no comparison on the field

  • Constant fields – These are fields that have a constant value among all entries. For example, as shown in the following, the tunnel type is VXLAN:

    Copy
    Copied!
                

    match.tun.type = DOCA_FLOW_TUN_VXLAN;

    These fields must only be configured once at pipe build stage, not once per new pipeline entry.

  • Changeable fields – These are fields whose value may change per entry. For example, the following shows match on a destination IPv4 address of variable per-entry value (outer 5-tuple):

    Copy
    Copied!
                

    match.outer.ip4.dst_ip = 0xffffffff;

  • The following is an example of a match, where:

    • Outer 5-tuple

      • L3 type is IPv4 – constant among entries by design

      • L4 type is UDP – constant among entries by design

      • Tunnel type is DOCA_FLOW_TUN_VXLAN – constant among entries by design

      • IPv4 destination address varies per entry

      • UDP destination port is always DOCA_VXLAN_DEFAULT_PORT

      • VXLAN tunnel ID varies per entry

      • The rest of the packet fields are ignored

    • Inner 5-tuple

      • L3 type is IPv4 – constant among entries by design

      • L4 type is TCP – constant among entries by design

      • IPv4 source and destination addresses vary per entry

      • TCP source and destination ports vary per entry

      • The rest of the packet fields are ignored

Copy
Copied!
            

// filter creation static void build_underlay_overlay_match(struct doca_flow_match *match) { //outer match->outer.l3_type = DOCA_FLOW_L3_TYPE_IP4; match->outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP;    match->tun.type = DOCA_FLOW_TUN_VXLAN;   match->outer.ip4.dst_ip = 0xffffffff; match->outer.udp.l4_port.dst_port = DOCA_VXLAN_DEFAULT_PORT; match->tun.vxlan_tun_id = 0xffffffff;   //inner match->inner.l3_type = DOCA_FLOW_L3_TYPE_IP4;    match->inner.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_TCP; match->inner.ip4.dst_ip = 0xffffffff;   match->inner.ip4.src_ip = 0xffffffff; match->inner.tcp.l4_port.src_port = 0xffff; match->inner.tcp.l4_port.dst_port = 0xffff; }   // create entry specifying specific values to match upon doca_error_t add_entry(struct doca_flow_pipe *pipe, struct doca_flow_port *port, struct doca_flow_pipe_entry **entry) {    struct doca_flow_match match = {};    struct entries_status status = {};    doca_error_t result;      match.outer.ip4.dst_ip = BE_IPV4_ADDR(7, 7, 7, 1); match.tun.vxlan_tun_id = RTE_BE32(9876); match.inner.ip4.src_ip = BE_IPV4_ADDR(8, 8, 8, 1); match.inner.ip4.dst_ip = BE_IPV4_ADDR(9, 9, 9, 1); match.inner.tcp.l4_port.src_port = rte_cpu_to_be_16(5678); match.inner.tcp.l4_port.dst_port = rte_cpu_to_be_16(1234);    result = doca_flow_pipe_add_entry(0, pipe, &match, &actions, NULL, NULL, 0, &status, entry); }

Note

The fields of the doca_flow_meta struct inside the match are not subject to implicit match rules and must be paired with explicit mask values.


Explicit Match

Match Type

Pipe Match Value (V)

Pipe Match Mask (M)

Entry Match Value (E)

Filter (A)

Rule (B)

Constant

V!=0xffff

0<M≤0xffff

0≤E≤0xffff

M

M&V

Changeable

V==0xffff

0<M≤0xffff

0≤E≤0xffff

M

M&E

Ignored

0≤V<0xffff

M==0

0≤E≤0xffff

0

0

In this case, there are two doca_flow_match items, the following considerations should be considered:

  • Ignored fields

    • M equals zero. This can be seen from the table where the rule equals 0. Since mask is also 0, the resulting packet after the filter is0. Thus, the comparison always succeeds.

    Copy
    Copied!
                

    match_mask.inner.ip4.dst_ip = 0;

  • Constant fields

    These are fields that have a constant value. For example, as shown in the following, the inner 5-tuple match on IPv4 destination addresses belonging to the 0.0.0.0/24 subnet, and this match is constant among all entries:

    Copy
    Copied!
                

    // BE_IPV4_ADDR converts 4 numbers A,B,C,D to a big endian representation of IP address A.B.C.D match.inner.ip4.dst_ip = 0; match_mask.inner.ip4.dst_ip = BE_IPV4_ADDR(255, 255, 255, 0);

    For example, as shown in the following, the inner 5-tuple match on IPv4 destination addresses belonging to the 1.2.0.0/16 subnet, and this match is constant among all entries. The last two octets of the match.inner.ip4.dst_ip are ignored because the match_mask of 255.255.0.0 is applied:

    Copy
    Copied!
                

    // BE_IPV4_ADDR converts 4 numbers A,B,C,D to a big endian representation of IP address A.B.C.D match.inner.ip4.dst_ip = BE_IPV4_ADDR(1, 2, 3, 4); match_mask.inner.ip4.dst_ip = BE_IPV4_ADDR(255, 255, 0, 0);

    Once a field is defined as constant, the field's value cannot be changed per entry.

    Tip

    Users should set constant fields to zero when adding entries for better code readability.

    A more complex example of constant matches may be achieved as follows:

    Copy
    Copied!
                

    match_mask.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(0xf0f0); match.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(0x5020)

    The following ports would be matched:

    • 0x5020 - 0x502f

    • 0x5120 - 0x512f

    • ...

    • 0x5f20 - 0x5f2f

Changeable fields

The following example matches on either FTP or TELNET well known port numbers and forwards packets to a server after modifying the destination IP address and destination port numbers. In the example, either FTP or TELNET are forwarded to the same server. FTP is forwarded to port 8000 and TELNET is forwarded to port 9000.

Copy
Copied!
            

// at Pipe creation doca_flow_pipe_cfg_set_name(pipe_cfg, "PORT_MAPPER"); doca_flow_pipe_cfg_set_type(pipe_cfg, DOCA_FLOW_PIPE_BASIC); match.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(0xffff); // v match_mask.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(0xffff); // M doca_flow_pipe_cfg_set_match(pipe_cfg, &match, &match_mask); actions_arr[0] = &actions; doca_flow_pipe_cfg_set_actions(pipe_cfg, action_arr, NULL, NULL, 1); doca_flow_pipe_cfg_set_is_root(pipe_cfg, true);   // Adding entries // FTP match.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(20); // E actions.outer.ip4.src_ip = server_addr; actions.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(8000); result = doca_flow_pipe_add_entry(0, pipe, &match, &actions, NULL, NULL, 0, &status, entry);   // TELNET match.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(23); // E actions.outer.ip4.src_ip = server_addr; actions.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(9000); result = doca_flow_pipe_add_entry(0, pipe, &match, &actions, NULL, NULL, 0, &status, entry);

Relaxed Match

Relaxed matching mode grants users full control on matching fields and guarantees that no fields are implicitly added by DOCA Flow.

Relaxed Matching and Pipeline Design Considerations

Relaxed matching mode provides full control to the DOCA application developer over the match design, without adding implicit match logic by the DOCA Flow library. This approach increases user responsibility to prevent unintended side effects caused by packet layout similarities (e.g., between UDP and TCP source/destination ports) or by skipping header type validation before matching on header fields. For instance, matching solely on a UDP destination port does not verify the presence of an L4 header or confirm that the L4 header is of UDP type.

To ensure effective design, early-stage pipes should classify packet types to filter out undesired packets. As the pipeline progresses, later stages can focus on more specific packet types based on prior matches. Examples include:

  • Match on VXLAN VNI:

    • For Early-stage pipes, ensure the packet contains a VXLAN header (this can be achieved in a single pipe):

      • The first pipe verifies the packet has a UDP header by matching the L4 packet type to UDP or the L3 next_proto field to UDP

      • The second pipe matches the UDP destination port to the commonly used VXLAN value (4789)

    • For later-stage pipes, match on the VXLAN VNI field

  • Match on UDP destination port:

    • For early-stage pipes, verify the packet contains a UDP header by matching the L4 packet type to UDP or the L3 next_proto field to UDP

    • For later-stage pipes, match on the UDP destination port field

Relaxed Matching Memory Footprint and Performance Impact

Consider the following strict matching mode example. There are three pipes:

  • Basic pipe A with match.outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_TCP and match.outer.tcp.flags = 1

  • Basic pipe B with match.outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP and match.outer.udp.l4_port.src_port = 8080

  • Control pipe C with two entries to direct TCP traffic to pipe A and UDP traffic to pipe B. The first entry has match.outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_TCP while the second has match.outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP.

As a result, the hardware matches on the L4 header type twice:

  • When the packet enters the filter in control pipe C to decide the next pipe

  • When the packet enters the filter of pipe A or B to match on the L4 header fields

With particularly large pipelines, such double matches decrease performance and increase the memory footprint in hardware. Relaxed matching mode gives the user greater control of the match logic to eliminate the implicitly added matches, consequently reducing hardware memory footprint and improving performance as well.

Parser Meta Usage with Relaxed Match

Parser meta matching is particularly useful when it comes to matching on a specific packet type. In relaxed mode, type selectors in the outer, inner, and tun parts of the doca_flow_match structs are used only for the type cast of the underlying unions. Header-type (packet type) matches are available using the parser_meta API.

For example, the scenario from the previous section may be overwritten by changing the match of control pipe C (with the same A and B pipes):

  • Basic pipe A with match.outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_TCP and match.outer.tcp.flags = 1

  • Basic pipe B with match.outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP and match.outer.udp.l4_port.src_port = 8080

  • Control pipe C with two entries to direct TCP traffic to pipe A and UDP traffic to pipe B. The first entry has match.parser_meta.outer_l4_type = DOCA_FLOW_L4_META_TCP while the second has match.parser_meta.outer_l4_type = DOCA_FLOW_L4_META_UDP.

As a result, the hardware performs the L4 header-type match only once, when the packet enters the filter of control pipe. Basic pipes' match.outer.l4_type_ext are used only for the selection of the match.outer.tcp or match.outer.udp structures during the inspection of match struct.

Examples

The following code snippets are used to demonstrate the redesign of a pipeline with relaxed matching for non-tunnel match cases.

The following is the code before the redesign:

Copy
Copied!
            

static void pipe_match_build(struct doca_flow_match *match) { match->outer.l3_type = DOCA_FLOW_L3_TYPE_IP4; match->outer.ip4.dst_ip = 0xffffffff; match->outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP; match->outer.udp.l4_port.src_port = 22; }

Without relaxed matching, the following hardware matches were performed for the code snippet above:

  • L3 header type is IPv4 – constant among entries by design

  • L4 header type is UDP – constant among entries by design

  • IPv4 destination address varies per entry

  • UDP source port is constant among entries

  • The rest of the packet fields are ignored

With relaxed matching the following pipeline stages, where pipe1 forwards packets to pipe2, should be considered to achieve a similar match as above:

Copy
Copied!
            

static void pipe1_match_build(struct doca_flow_match *match) { // Classifier logic. Only IPv4, UDP packets are to be forwarded to pipe2 match.parser_meta.outer_l3_type = DOCA_FLOW_L3_META_IPV4; match.parser_meta.outer_l4_type = DOCA_FLOW_L4_META_UDP; }   static void pipe2_match_build(struct doca_flow_match *match) { // Main logic. Match on the specific packet fields match->outer.l3_type = DOCA_FLOW_L3_TYPE_IP4; match->outer.ip4.dst_ip = 0xffffffff; match->outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP; match->outer.udp.l4_port.src_port = 22; }

The following code snippet demonstrates the redesign of a pipeline with relaxed matching for tunnel match cases:

Copy
Copied!
            

static void pipe1_match_build(struct doca_flow_match *match) { match->outer.l3_type = DOCA_FLOW_L3_TYPE_IP4; match->outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP; match->tun.type = DOCA_FLOW_TUN_VXLAN; match->outer.ip4.dst_ip = 0xffffffff; match->outer.udp.l4_port.src_port = 0x22; match->tun.vxlan_tun_id = 0xffffffff; }

Without relaxed matching, the following hardware matches were performed for the code snippet above:

  • L3 type is IPv4 – constant among entries by design

  • L4 type is UDP – constant among entries by design

  • Tunnel type is DOCA_FLOW_TUN_VXLAN – constant among entries by design

  • IPv4 destination address varies per entry

  • UDP source port is always 22

  • VXLAN tunnel ID varies per entry

  • The rest of the packet fields are ignored

With relaxed matching, the following pipeline stages, where pipe1 forwards packets to pipe2, should be considered to achieve a similar match as above:

Copy
Copied!
            

static void pipe1_match_build(struct doca_flow_match *match) {     // Classifier logic. Only IPv4, UDP packets are to be forwarded to pipe2 match.parser_meta.outer_l3_type = DOCA_FLOW_L3_META_IPV4; match.parser_meta.outer_l4_type = DOCA_FLOW_L4_META_UDP; }   static void pipe2_match_build(struct doca_flow_match *match) {     // Main logic. Match on the specific packet fields match->outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP; match->outer.udp.l4_port.src_port = 22; match->outer.udp.l4_port.dst_port = DOCA_VXLAN_DEFAULT_PORT; match->outer.l3_type = DOCA_FLOW_L3_TYPE_IP4; match->outer.ip4.dst_ip = 0xffffffff; match->tun.type = DOCA_FLOW_TUN_VXLAN; match->tun.vxlan_tun_id = 0xffffffff; }


Relaxed Matching Considerations

With relaxed matching, header type fields within outer, inner, or tun structs no longer indicate a match on the specific protocol. They are used solely as a selector for the relevant header fields. For example, to match on outer.ip4.dst_ip, users must set outer.l3_type = DOCA_FLOW_L3_TYPE_IP4. With this match, the L3 header is checked for the IPv4 destination address, however there is no check that the L3 header is of IPv4 type. It is user responsibility to make sure that packets arriving to such a filter indeed have an L3 header of type IPv4.

To match on a specific protocol/tunnel type, consider the following:

  • To match on an inner/outer L3/L4 protocol type, users can use relevant doca_flow_parser_meta fields as explained above. For example, for outer protocols, parser_meta.outer_l[3,4]_type fields can be used.

  • To match on a specific tunnel type, users should match on a tunnel according to its specification. For example, for a VXLAN tunnel, a match on UDP destination port 4789 can be used. Another option is to use the L3 next protocol field. For example, for IPv4 with next header GRE, one can match on the IPv4 headers' next protocol field value to match GRE IP protocol number 47.

More relaxed matching design best practices can be found in the samples Flow Drop, Flow VXLAN Encap, and LPM with Exact Match Logic.

Note

With relaxed matching, to achieve a match-all functionality, either one of the following methods can be used during pipe creation:

  • Set the match_mask structure to NULL and set the match structure to all zeroes

  • Set the match_mask structure to all zeroes while the match structure have any setting

Note

With relaxed matching, if any of the selectors is used without setting a relevant field, the pipe/entry creation would fail with the following error message:

Copy
Copied!
            

failed building active opcode - active opcode <opcode number> is protocol only

Setting Pipe Actions

Pipe Actions Execution Order

When setting actions, they are executed in the following order:

  1. Crypto (decryption)

  2. Decapsulation

  3. Pop

  4. Meta

  5. Outer

  6. Tun

  7. Push

  8. Encapsulation

  9. Crypto (encryption)

The desc_array field in struct doca_flow_actions_descs is executed by order of the array after the above action execution.

Auto-modification

Similarly to setting pipe match, actions also have a template definition.

Similarly to doca_flow_match in the creation phase, only the subset of actions that should be executed per packet are defined. This is done in a similar way to match, namely by classifying a field of doca_flow_actions to one of the following:

  • Ignored field – field is zeroed, modify is not used.

  • Constant fields – when a field must be modified per packet, but the value is the same for all packets, a one-time value on action definitions can be used

  • Changeable fields – fields that may have more than one possible value, and the exact values are set by the user per entry

    Copy
    Copied!
                

    actions.outer.ip4.dst_ip = 0xffffffff

    Note

    The action_mask should be set as 0xffffffff and action as 0 if the user wants to configure 0 to this field.

    Note

    Changeable shared encap/decap actions must use shared resources that have been configured in advance.

    Note

    A single pipe cannot be created with both a changeable shared encap/decap action and a changeable FWD action at the same time. This configuration is not supported and will fail.

    Note

    To use both a changeable shared encap/decap action and a changeable FWD action, implement them in sequential pipes. Use one pipe for the encap/decap action, and then use a subsequent pipe for the FWD action.

Explicit Modification Type

It is possible to force constant modification or per-entry modification with action mask. For example:

Copy
Copied!
            

static void create_constant_modify_actions(struct doca_flow_actions *actions, struct doca_flow_actions *actions_mask, struct doca_flow_action_descs *descs) { actions->outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP; actions->outer.udp.src_port = 0x1234; actions_mask->outer.udp.src_port = 0xffff; }


Copy Field

The action descriptor can be used to copy between the packet field and metadata. For example:

Copy
Copied!
            

#define META_U32_BIT_OFFSET(idx) (offsetof(struct doca_flow_meta, u32[(idx)]) << 3)   static void  create_copy_packet_to_meta_actions(struct doca_flow_match *match, struct doca_flow_action_desc *desc) { desc->type = DOCA_FLOW_ACTION_COPY; desc->field_op.src.field_string = "outer.ipv4.src_ip"; desc->field_op.src.bit_offset = 0; desc->field_op.dst.field_string = "meta.data"; desc->field_op.dst.bit_offset = META_U32_BIT_OFFSET(1); /* Bit offset of meta.u32[1] */; }


Multiple Actions List

Creating a pipe is possible using a list of multiple actions. For example:

Copy
Copied!
            

static void create_multi_actions_for_pipe_cfg() { struct doca_flow_actions *actions_arr[2]; struct doca_flow_actions actions_0 = {0}, actions_1 = {0}; struct doca_flow_pipe_cfg *pipe_cfg; /* input configurations for actions_0 and actions_1 */  actions_arr[0] = &actions_0; actions_arr[1] = &actions_1; doca_flow_pipe_cfg_set_actions(pipe_cfg, actions_arr, NULL, NULL, 2); }


Summary of Action Types

Pipe Creation

Entry Creation

Behavior

action_desc

Pipe Actions

Pipe Actions Mask

Entry Actions

doca_flow_action_type

Configuration

DOCA_FLOW_ACTION_AUTO/

action_desc = NULL

No specific config

0

0

N/A

Field ignored, no modification

0

mask != 0

N/A

Apply 0 and mask to all entries

val != 0 && val != 0xFF

mask != 0

N/A

Apply val and mask to all entries

val = 0xFF

mask = 0

N/A

Apply 0xFF to all entries

val = 0xFF

mask != 0

Define val per entry

Apply entry's val and mask

DOCA_FLOW_ACTION_ADD

Add field value or from src

Define only the dst field and width

val != 0

N/A

N/A

Add to dst this val to all entries

val == 0

N/A

Define val per entry

Add to dst entry's val

Define the src and dst fields and width

Define the source and destination fields.

  • Meta field → header field

  • Header field → meta field

  • Meta field → meta field

N/A

N/A

Add data from src fields to dst for all entries

DOCA_FLOW_ACTION_COPY

Copy field to another field

N/A

Define the source and destination fields.

  • Meta field → header field

  • Header field → meta field

  • Meta field → meta field

N/A

N/A

Copy data between fields for all entries


Queue Balancing for Non-shared Actions

For non-shared actions, each action resource handle is implicitly bound to a specific entry. To enable faster, lock-free access during entry creation, these action resource handles are cached on a per-queue basis, mirroring how entries are added.

Therefore, applications should distribute entry additions evenly across all available queues. This practice prevents cache fragmentation and ensures efficient use of action resources.

Warning

If entries are distributed unevenly, it can lead to a worst-case scenario where the number of successfully attached action handles is slightly lower than the total number of defined resources, even if those resources are technically available system-wide.

Setting Pipe Monitoring

If a meter policer should be used, then it is possible to have the same configuration for all policers on the pipe or to have a specific configuration per entry. The meter policer is determined by the FWD action. If an entry has NULL FWD action, the policer FWD action is taken from the pipe.

The monitor also includes the aging configuration, if the aging time is set, this entry ages out if timeout passes without any matching on the entry.

For example:

Copy
Copied!
            

static void build_entry_monitor(struct doca_flow_monitor *monitor, void *user_ctx) { monitor->aging_sec = 10; }

Refer to Pipe Entry Aged Query for more information.

Setting Pipe Forwarding

The FWD (forwarding) action is the last action in a pipe, and it directs where the packet goes next. Users may configure one of the following destinations:

  • Send to software (representor)

  • Send to wire

  • Jump to next pipe

  • Drop packets

The FORWARDING action may be set for pipe create, but it can also be unique per entry.

Pipe forwarding can be set either at creation time or be deferred to entry addition:

  • If the fwd.type type is not DOCA_FLOW_FWD_CHANGEABLE, any match on the pipe, on any entry, is forwarded to the specified target

  • If the fwd.type type is DOCA_FLOW_FWD_CHANGEABLE, the target would match on whatever is defined in the fwd.type field of the specific entry

Putting this logic in a table look like this:

Pipe Fwd.type

Entry Fwd.type

Actual Forward

Comment

Equals DOCA_FLOW_FWD_CHANGEABLE

X

X

X must not equal DOCA_FLOW_FWD_CHANGEABLE

X != DOCA_FLOW_FWD_CHANGEABLE

Does not care

X

When a pipe includes meter monitor <cir, cbs>, it must h ave fwd defined as well as the policer.

The following is an RSS forwarding example:

Copy
Copied!
            

fwd.type = DOCA_FLOW_FWD_RSS; fwd.rss_type = DOCA_FLOW_RESOURCE_TYPE_NON_SHARED; fwd.rss.queues_array = queues; fwd.rss.outer_flags = DOCA_FLOW_RSS_IPV4 | DOCA_FLOW_RSS_UDP; fwd.rss.nr_queues = 4;

Queues point to the uint16_t array that contains the queue numbers. When a port is started, the number of queues is defined, starting from zero up to the number of queues minus 1. RSS queue numbers may contain any subset of those predefined queue numbers. For a specific match, a packet may be directed to a single queue by having RSS forwarding with a single queue.

Changeable RSS forwarding is supported. When creating the pipe, the num_of_queues must be set to 0xffffffff, then different forwarding RSS information can be set when adding each entry.

Copy
Copied!
            

fwd->num_of_queues = 0xffffffff;

The packet is directed to the port. In many instances the complete pipe is executed in the hardware, including the forwarding of the packet back to the wire. The packet never arrives to the software.

Example code for forwarding to port:

Copy
Copied!
            

struct doca_flow_fwd *fwd = malloc(sizeof(struct doca_flow_fwd)); memset(fwd, 0, sizeof(struct doca_flow_fwd)); fwd->type = DOCA_FLOW_FWD_PORT; fwd->port_id = port_id; // this should the same port_id that was set in doca_flow_port_cfg_set_port_id()

The type of forwarding is DOCA_FLOW_FWD_PORT and the only data required is the port_id as defined in DOCA_FLOW_PORT.

Note

The destination port for any FWD action (identified by fwd->port_id) must be started in advance by calling doca_flow_port_start().

Note

A single pipe cannot be created with both a changeable shared encap/decap action and a changeable FWD action simultaneously. This configuration is not supported.

Changeable port forwarding is also supported. When creating the pipe, the port_id must be set to 0xffff, then different forwarding port_id values can be set when adding each entry.

Copy
Copied!
            

fwd->port_id = 0xffff;


Shared Resources

DOCA Flow supports several types of resources that can be shared. The supported types of resources can be:

  • Meters

  • Counters

  • RSS queues

  • PSPs

  • Encap

  • Decap

  • IPsec SA

Shared resources can be used by several pipes and can save device and memory resources while promoting better performance.

To create and configure shared resource, the user should go through the steps detailed in the following subsections.

Creating Shared Resource Configuration Object

Call doca_flow_cfg_create(&flow_cfg), passing a pointer to struct doca_flow_cfg to be used to fill the required parameters for the shared resource.

Note

The struct doca_flow_cfg object is used for configuring other resources besides the aforementioned shared resources, but this section only refers to the configuration of shared resources.


Setting Number of Shared Resources per Shared Resource Type

This can be done by calling doca_flow_cfg_set_nr_shared_resource(). Refer to the API documentation for details on the configuration process.

Conclude the configuration by calling doca_flow_init().

Configuring Shared Resource

When shared resources are allocated, they are assigned identifiers ranging from 0 and increasing incrementally. For example, if the user configures two shared counters, they would bear the identifiers 0 and 1.

Note

Note that each resource has its own identifier space. So, if users have two shared counters and three meters, they would bear identifiers 0..1 and 0..2 respectively.

Configuring the shared resources requires the user to call doca_flow_shared_resource_set_cfg().

Binding Shared Resource

A shared resource must be bound by calling doca_flow_shared_resources_bind() which binds the resource to a pointer. The object to which the resource is bound is usually a struct doca_flow_port pointer.

Using Shared Resources

After a resource has been configured, it can be used by referring to its ID.

In the case of meters, counters, they are referenced through struct doca_flow_monitor during pipe creation or entry addition.

Querying Shared Resource

Querying shared resources can be done by calling doca_flow_shared_resources_query(). The function accepts the resource type and an array of resource numbers and returns an array of struct doca_flow_shared_resource_result with the results.

Shared Meter Resource

A shared meter can be used in multiple pipe entries (hardware steering mode support only).

The shared meter action marks a packet with one of three colors: Green, Yellow, and Red. The packet color can then be matched in the next pipe, and an appropriate action may be taken. For example, packets marked in red are usually dropped. So, the next pipe to meter action may have an entry which matches on red and has fwd type DOCA_FLOW_FWD_DROP.

DOCA Flow supports three marking algorithms based on RFCs: 2697, 2698, and 4115.

RFC 2697 – Single-rate Three Color Marker (srTCM)

rfc2697-version-1-modificationdate-1761115069630-api-v2.png

CBS (committed burst size) is the bucket size which is granted credentials at a CIR (committed information rate). If CBS overflow occurs, credentials are passed to the EBS (excess burst size) bucket. Packets passing through the meter consume credentials. A packet is marked green if it does not exceed the CBS, yellow if it exceeds the CBS but not the EBS, and red otherwise. A packet can have an initial color upon entering the meter. A pre-colored yellow packet will start consuming credentials from the EBS.

RFC 2698 – Two-rate Three Color Marker (trTCM)

rfc2698-version-1-modificationdate-1761115070200-api-v2.png

CBS and CIR are defined as in RFC 2697. PBS (peak burst size) is a second bucket which is granted credentials at a PIR (peak information rate). There is no overflow of credentials from the CBS bucket to the PBS bucket. The PIR must be equal to or greater than the CIR. Packets consuming CBS credentials consume PBS credentials as well. A packet is marked red if it exceeds the PIR. Otherwise, it is marked either yellow or green depending on whether it exceeds the CIR or not. A packet can have an initial color upon entering the meter. A pre-colored yellow packet starts consuming credentials from the PBS.

RFC 4115 – trTCM without Peak-rate Dependency

rfc4115-version-1-modificationdate-1761115069893-api-v2.png

EBS is a second bucket which is granted credentials at a EIR (excess information rate) and gets overflowed credentials from the CBS. For the packet marking algorithm, refer to RFC 4115.

All information rates (i.e., CIR, PIR, EIR) are measured in one of two units:

  • Per bytes (DOCA_FLOW_METER_LIMIT_TYPE_BYTES); or

  • Per packets (DOCA_FLOW_METER_LIMIT_TYPE_PACKETS)

The following sections present the steps for configuring and using shared meters to mark packets.

Bandwidth Measurement

When testing a DOCA Flow meter, it is important to understand that the rate-limiting parameters (CIR, PIR, EIR) apply to the information rate, not the full L2 Ethernet frame. The information rate typically refers to the inner IP layer of the packet.

Because of this, to accurately measure the metered bandwidth and verify that it matches your settings, you must perform a mathematical adaptation to exclude the non-IP layer overhead from your byte count.

Measurement Method

The recommended method is to take two measurements, 1 second apart, after the traffic stream has been stable for at least 5 seconds.

Terminology:

  • pckts1 and bytes1: The total number of packets and bytes measured at time X (where X ≥ 5 seconds).

  • pckts2 and bytes2: The total number of packets and bytes measured at time X+1 second.

The true metered bandwidth (in bytes per second) is the adapted byte count, which is calculated by subtracting the total overhead from the total bytes measured during that one-second interval:

Copy
Copied!
            

Adapted Byte Count = (bytes2 - bytes1) - (overhead_bytes * (pckts2 - pckts1))

Here, overhead_bytes is the size of all non-IP-layer headers for your specific packet type.

Overhead Examples

The overhead_bytes value depends on the packet's structure. Common examples include:

  • Simple IP packets: 18 bytes

    • (14-byte MAC header + 4-byte FCS)

  • VLAN packets: 22 bytes

    • (14-byte MAC header + 4-byte VLAN tag + 4-byte FCS)

  • MPLS tunnel packets: 50 bytes

    • (14-byte MAC header + 4-byte MPLS + 20-byte Tunnel IP + 8-byte Tunnel UDP + 4-byte FCS)

Verifying the Rate

You can then compare the calculated bandwidth against your meter's settings:

  • For byte-based limits (DOCA_FLOW_METER_LIMIT_TYPE_BYTES): The expected bandwidth is the Adapted Byte Count calculated above.

  • For packet-based limits (DOCA_FLOW_METER_LIMIT_TYPE_PACKETS): The expected bandwidth is the packet delta: (pckts2 - pckts1).

Shared IPsec SA Resource

The Shared IPsec Security Association (SA) resource is used to manage the IPsec ESP protocol for both encryption and decryption. This resource is referenced from the doca_flow_actions struct via the doca_flow_crypto_actions field.

Default Hardware Offload Behavior

By default, the resource manages the sequence number (SN) state in hardware (i.e., sn_offload is enabled). This includes:

  • Encryption: Automatically incrementing the sequence number (SN) for each packet.

  • Decryption: Performing anti-replay protection. The anti-replay status (syndrome) is stored in meta.u32[0].

Manual Software SN Control

To manage the sequence number in software, the sn_offload feature must be disabled in the port configuration passed to doca_flow_port_start (see the DOCA API References for details).

Note

When sn_offload is disabled, the following hardware offload fields in the resource configuration are ignored:

  • sn_offload_type

  • win_size

  • sn_initial

  • lifetime_threshold


Querying the Resource

When sn_offload is enabled, you can query the shared IPsec SA resource to retrieve its current state:

  • For an encryption resource: The query returns the current sequence number (SN).

  • For a decryption resource: The query returns the lower bound of the anti-replay window.

Note

Querying the resource is not supported when sn_offload is disabled.


State Maintenance

To ensure a valid resource state during operation, the doca_flow_crypto_ipsec_resource_handle function should be called periodically.

Shared Encap Resource

The shared encap resource defines a specific, reusable encapsulation configuration. Each unique configuration, primarily defined by its reformat type (is_l2) and size, is represented by a shared encap ID.

This ID allows the same encapsulation to be applied across multiple pipes and entries.

Key characteristics:

  • This feature is supported only in hardware steering mode.

  • Once a shared encap resource is configured and assigned an ID, its settings (like type and size) cannot be modified.

  • The "shared encap action" in a pipe uses this resource ID to encapsulate a packet with the pre-configured tunnel information.

Shared Decap Resource

The shared decap resource defines a specific, reusable decapsulation configuration. Each unique configuration, primarily defined by its reformat type (is_l2) and size, is represented by a shared decap ID.

This ID allows the same decapsulation to be applied across multiple pipes and entries.

Key characteristics:

  • This feature is supported only in hardware steering mode.

  • Once a shared decap resource is configured and assigned an ID, its settings (like type and size) cannot be modified.

  • If the is_l2 flag is set to false (indicating a non-L2 decap), Ethernet information must be provided in the configuration.

  • The "shared decap action" in a pipe uses this resource ID to decapsulate the packet according to the pre-configured rules.

Shared PSP Resource

The PSP shared resource is used for PSP encryption. The resource should be pointed to from the doca_flow_crypto_actions struct in doca_flow_actions.

The resource should be configured with a key to encrypt the packets. See DOCA API References for PSP key generation for a reference about key handling on decrypt side.

Creating a Pipe

Once all pipe parameters are defined, call doca_flow_pipe_create to create the pipe.

Info

Refer to the DOCA API documentation to see which fields are optional and may be skipped. It is typically recommended to set optional fields to 0 when not in use. See Miss Pipe and Control Pipe for more information.

On success, this function returns a handle to the pipe, which is required when adding new entries. If creation fails, the function returns NULL. An error struct, if provided by the user, will be populated with the error details.

A pipe acts as a container for its entries. When a pipe is destroyed, all entries bound to it are automatically removed. For more details, see the Pipe Entry section.

Implementing Priority

DOCA Flow does not have an explicit priority field for pipes or entries. Priority is managed through the order of operation and rule insertion.

  • Chaining (Recommended) – The most explicit way to implement priority is to create a chain. Create a "miss" rule in a high-priority pipe that forwards traffic to the next, lower-priority pipe (e.g., match PIPE1, and on miss, jump to PIPE2).

  • Insertion Order (Overlapping Entries) – Priority can also be determined by the order in which entries are added to pipes.

    • If two pipes have overlapping match criteria, the pipe whose entries were added last has the higher priority.

    • For example, to make PIPE1 (high-priority) take precedence over PIPE2 (low-priority), you must add all entries for PIPE2 first, followed by the overlapping entries for PIPE1.

Pipe Sizing and Caching

When you define the number of entries for a pipe, this number must account for both active entries and entries held in internal caches.

This means the actual number of entries your application can add will be less than the total defined size. The total number of cached entries is limited to (1K * number of queues), and this cache size cannot exceed 2% of the total pipe size.

Pipe Entry (doca_flow_pipe_add_entry)

An entry is a specific instance inside of a pipe. When defining a pipe, users define match criteria (subset of fields to be matched), the type of actions to be done on matched packets, monitor, and, optionally, the FWD action.

When a user calls doca_flow_pipe_add_entry() to add an entry, they must define the values for the match criteria and action parameters. If a Forward (FWD) action was not defined as constant for the entire pipe, it is mandatory to specify the FWD action for each entry.

Concurrency and Queue Affinity

DOCA Flow is designed to support millions of new entries per second. To achieve this, its architecture requires that entry-addition operations be parallelized.

This is managed by assigning a dedicated pipe_queue ID to each CPU core that will be adding entries. Having a unique queue ID per core prevents the need for data-structure locking and enables efficient, parallel interaction with the hardware.

pipe-entry-queue-diagram-version-1-modificationdate-1761115073090-api-v2.png

Warning

Each core must use its own dedicated pipe_queue ID when calling doca_flow_pipe_add_entry. Using the same pipe_queue ID from different cores simultaneously will cause a race condition and lead to undefined behavior.


Usage Notes

  • Applications must not add, remove, or update pipe entries from within a doca_flow_entry_process_cb callback.

  • Applications should distribute entry additions evenly across all queues. This prevents cache fragmentation and ensures efficient resource use. In a worst-case scenario where entries are not balanced, the number of successfully inserted entries may be slightly lower than the total number defined during pipe creation.

Failure Path

Entry insertion can fail in two places, add_entry and add_entry_cb.

  • When add_entry fails, no cleanup is required.

  • When add_entry succeeds, a handle is returned to the user. If the subsequent add_entry_cb fails, the user is responsible for releasing the handle through a rm_entry call. This rm_entry call is expected to return DOCA_SUCCESS and is expected to invoke doca_rm_entry_cb with a successful return code.

Pipe Entry Counting

By default, no counter is added. If defined in monitor, a unique counter is added per entry.

Note

Having a counter per entry affects performance and should be avoided if it is not required by the application.

The retrieved statistics are stored in struct doca_flow_query.

Note

Counters have a granularity of 1 second.


Pipe Entry Aged Query

When a user calls doca_flow_aging_handle(), this query is used to get the aged-out entries by the time quota in microseconds. The user callback is invoked by this API with the aged entries.

Since the number of flows can be very large, the query of aged flows is limited by a quota in microseconds. This means that it may return without all flows and requires the user to call it again. When the query has gone over all flows, a full cycle is done.

Pipe Entry (doca_flow_pipe_update_entry)

Updating an entry requires available space in the pipe. If the pipe is full, the function returns DOCA_ERROR_AGAIN. To proceed with the update, you may need to release an existing entry to free up space.

Pipes with Multiple Actions

DOCA Flow allows you to define a pipe with an array of multiple, distinct actions. This feature is useful for creating multiple flow entries that share the same match criteria but require different actions.

When adding or updating an entry (e.g., with doca_flow_pipe_add_entry), you select which action to execute by providing its index in the action_idx argument.

Note

Even if a pipe has multiple actions defined, only one action will be executed for any given packet that matches an entry.

Consider a case where two flows are needed with identical match criteria but different actions:

  1. Pipe creation: A pipe is created with two actions defined in its actions array:

    • Index 0: An action to modify_mac.

    • Index 1: An action to modify_ip.

  2. Entry addition:

    • Entry A is added (e.g., match L4_PORT=80) with its action_idx set to 0. Packets matching this entry will have their MAC modified.

    • Entry B is added (e.g., match L4_PORT=80) with its action_idx set to 1. Packets matching this entry will have their IP modified.

Pipe Chaining, Priority, and Miss Handling

This section describes how to chain pipes together to implement priority, how to use Control Pipes to resolve complex matching conflicts, and how the Port Default Miss handler works.

Implementing Priority with fwd_miss (Miss-Pipe Chaining)

DOCA Flow implements priority by chaining pipes. A pipe's fwd_miss handler defines what to do with a packet if it does not match any entry within that pipe. This allows you to create a "miss-pipe" chain:

  1. A packet is looked up in Pipe 1 (the highest priority table).

  2. If it matches an entry, the action is taken.

  3. If it does not match, the fwd_miss handler forwards the packet to Pipe 2.

  4. The lookup process repeats in Pipe 2.

This fwd_miss behavior is configured in the struct doca_flow_fwd of a pipe. When creating a pipe, the fwd_miss struct's next_pipe field specifies where to send non-matching packets.

The fwd_miss handler only supports two forwarding types:

  • DOCA_FLOW_FWD_PIPE: Forwards the packet to another pipe for further processing.

  • DOCA_FLOW_FWD_DROP: Drops the packet.

Note

Other forwarding types, such as forwarding to a port or an RSS queue, are not supported in a fwd_miss handler.


Root Pipes vs. Chained Pipes

A pipe's role is defined by the is_root flag in its struct doca_flow_pipe_cfg:

  • is_root = true: This is a root pipe. It is executed immediately on packet arrival. Only one root pipe is allowed. If you need multiple root-level "pipes", you must create a single root control pipe that forwards packets to different non-root pipes.

  • is_root = false: This is a non-root pipe (a "next pipe"). It is not executed on packet arrival and can only be reached by being the target of a DOCA_FLOW_FWD_PIPE action (e.g., from another pipe's entry or fwd_miss handler).

Control Pipes and Prioritized Entries

Internally, DOCA Flow implements the fwd_miss handler as a flow entry with the lowest possible priority. This can create conflicts.

For example, consider a pipe that:

  1. Matches specific VXLAN flows and sends them to RSS.

  2. Has a fwd_miss handler to hairpin all other (non-matching) packets.

If a new VXLAN packet arrives for an IP that does not have a specific flow entry, it will not match (1) and will incorrectly fall through to (2), where it is hairpinned instead of being sent to RSS.

The solution is to use a Control Pipe. A Control Pipe allows you to add entries with explicit priorities, giving you fine-grained control to resolve these conflicts.

  • You add prioritized entries to a Control Pipe by calling doca_flow_control_pipe_add_entry().

  • This function includes a priority field (where 0 is the highest and 7 is the lowest), which overrides the default insertion-order priority.

Example for Resolving Conflicts with Control Entries

Using the scenario above, you can fix the conflict with a Control Pipe:

  1. Configure fwd_miss: Set the pipe's fwd_miss handler to hairpin packets. This creates an internal "hairpin" rule at the lowest priority (e.g., priority 7).

  2. Add a control entry: Call doca_flow_control_pipe_add_entry() to add a new, higher-priority rule:

    • Priority: 6 (higher than the fwd_miss rule's priority of 7).

    • Match: Match all VXLAN packets (not just specific IPs).

    • Forward: Forward to the RSS pipe.

As a result, the new VXLAN packet no longer matches a specific IP, but it will match the new "all VXLAN" control entry at priority 6. It is correctly forwarded to RSS and never falls through to the lowest-priority (7) fwd_miss rule.

Port Default Miss

This is the final fallback action for the port, used when a pipe's fwd_miss is not set (i.e., NULL).

By default, this forwards all non-matched traffic to the kernel and is active as soon as the port is started. The target of this default miss can be controlled using the following API:

Copy
Copied!
            

enum doca_flow_port_default_miss_type { DOCA_FLOW_PORT_DEFAULT_MISS_KERNEL = 0, DOCA_FLOW_PORT_DEFAULT_MISS_RSS, DOCA_FLOW_PORT_DEFAULT_MISS_DROP, };   doca_error_t doca_flow_port_update_default_miss( struct doca_flow_port *port, enum doca_flow_port_default_miss_type default_miss, bool counter_en );   doca_error_t doca_flow_port_query_default_miss( struct doca_flow_port *port, struct doca_flow_resource_query *data );

  • The target of the default miss (KERNEL, RSS, or DROP) can be updated at any time.

  • The counter (counter_en) can only be enabled before the first use of or update to the default miss, and it cannot be disabled once set.

  • The DOCA_FLOW_PORT_DEFAULT_MISS_RSS option distributes packets over all existing RSS queues, hashing each packet using its L3/L4 information. Packets without a hashable L3/L4 header will receive a constant hash value.

doca_flow_pipe_lpm

The doca_flow_pipe_lpm pipe performs Longest Prefix Match (LPM) matching.

LPM Logic

LPM matching is performed on a single field specified at pipe creation, such as the outer destination IP. Each entry consists of a value and a mask (e.g., 10.0.0.0/8, 10.10.0.0/16). The LPM match is defined as the entry with the maximum number of matching bits.

For example, given two entries, 10.0.0.0/8 and 10.7.0.0/16:

  • An incoming IP of 10.1.9.2 matches 10.0.0.0/8.

  • An incoming IP of 10.7.9.2 matches 10.7.0.0/16, as its 16-bit prefix is the longest match.

Exact Match (EM) Logic

In addition to LPM, the pipe can simultaneously apply Exact Match (EM) logic. The EM result is combined with the LPM result using a logical AND. If a packet matches the LPM logic but fails the EM logic, it is considered a pipe miss.

EM logic is supported on a combination of the following fields:

  • meta.u32[1]

  • inner.eth.dst_mac

  • tun.vxlan_tun_id

A field (e.g., inner destination MAC) cannot be used for both LPM and EM at the same time.

The primary benefit of EM logic is to allow multiple entries to share the same LPM but have different EM values. For example, an application can have multiple entries for the same IP prefix 1.1.1.1/32 that match different meta values, such as 42, 555, and 1020.

Tip

Support for meta.u32[1] allows for EM on any field, as its value can be copied to meta.u32[1] in a preceding pipe.


Enabling EM Logic

Enabling EM logic is a two-step process:

  1. At pipe creation:

    • Specify the LPM field using the match parameter.

    • Enable the EM fields by fully masking them in the match_mask parameter (e.g., set match_mask.meta.u32[1] to all 0xFFs, and/or inner.eth.dst_mac, and/or tun.vxlan_tun_id).

    • If using tun.vxlan_tun_id, match_mask.tun.type must be set to DOCA_FLOW_TUN_VXLAN.

  2. At entry addition:

    • Provide the LPM value and its corresponding prefix mask (e.g., 1.1.1.1 and /32) in the match and match_mask structures, respectively.

    • Provide the EM values (e.g., meta.u32[1] = 42) in the match structure. The match_mask is only for the LPM field's prefix mask.

General Behavior and Limitations

  • Actions and forwarding: The actions and fwd configurations for an LPM pipe work the same as for a basic DOCA Flow pipe.

  • Root pipe: A doca_flow_pipe_lpm cannot be a root pipe. A root pipe must be created to forward packets to the LPM pipe.

  • LPM field support: LPM matching can only be performed on inner/outer IP addresses and inner/outer MAC addresses.

  • Monitoring: The monitor (doca_flow_monitor) for an LPM pipe only supports non-shared counters. Other monitoring capabilities are not supported.

  • Insertion latency: Entry insertion for an LPM pipe can, in some cases, take milliseconds.

    • Best practice: Entries should be added from a control path rather than a fast data path.

    • Performance: To get the best insertion performance, add entries in large batches.

doca_flow_pipe_acl

doca_flow_pipe_acl uses a ccess-control list (ACL) matching. ACL matching is five tuple of the doca_flow_match. Each entry consists of a value and a mask (e.g., 10.0.0.0/8, 10.10.0.0/16, etc.) for IP address fields, port range, or specific port in the port fields, protocol, and priority of the entry.

ACL entry port configuration:

  • Mask port is 0 ==> Any port

  • Mask port is equal to match port ==> Exact port. Port with mask 0xffff.

  • Mask port > match port ==> Match port is used as port from and mask port is used as port to

Monitor actions are not supported in ACL. FWD of the DOCA Flow ACL pipe works the same as the basic DOCA Flow pipe.

ACL supports the following types of FWD:

  • DOCA_FLOW_FWD_PORT

  • DOCA_FLOW_FWD_PIPE

  • DOCA_FLOW_FWD_DROP

doca_flow_pipe_lpm insertion max latency can be measured in milliseconds in some cases and, therefore, it is better to insert it from the control path. To get the best insertion performance, entries should be added in large batches.

Note

An ACL pipe can be a root pipe.

Note

An ACL pipe can be in ingress and egress domain.

Note

An ACL pipe must be accessed on a single queue. Different ACL pipes may be accessed on different queues.

Note

Adding an entry to the ACL pipe after sending an entry with flag DOCA_FLOW_NO_WAIT is not supported.

Note

Removing an entry from an ACL pipe is not supported.


doca_flow_pipe_ordered_list

doca_flow_pipe_ordered_list allows the user to define a specific order of actions and multiply the same type of actions (i.e., specific ordering between counter/meter and encap/decap).

An ordered list pipe is defined by an array of actions (i.e., sequences of actions). Each entry can be an instance one of these sequences. An ordered list pipe may consist of up to an array of 8 different actions. The maximum size of each action array is 4 elements. Resource allocation may be optimized when combining multiple action arrays in one ordered list pipe.

doca_flow_pipe_hash

The doca_flow_pipe_hash API allows for creating a pipe where entries are matched by an index, which is typically the result of a hash calculation on the packet.

Key characteristics of a hash pipe:

  • A doca_flow_match structure is provided only at pipe creation.

  • This doca_flow_match mask defines which packet fields are included in the hash calculation.

  • Other features (e.g., monitor, actions, fwd) behave identically to a basic DOCA Flow pipe.

  • The nb_flows (number of entries) defined in doca_flow_pipe_attr must be set to a power of 2.

Hash Map Algorithms

The hash pipe's behavior is defined by its map algorithm, which can be set using doca_flow_pipe_cfg_set_hash_map_algorithm().

Info

If no algorithm is specified, the default DOCA_FLOW_PIPE_HASH_MAP_ALGORITHM_HASH is used.

The available algorithms are:

  • DOCA_FLOW_PIPE_HASH_MAP_ALGORITHM_HASH – The default algorithm. Uses a standard hash function on the masked fields to calculate the index.

  • DOCA_FLOW_PIPE_HASH_MAP_ALGORITHM_RANDOM – Directs packets to entries randomly. No match mask is required.

  • DOCA_FLOW_PIPE_HASH_MAP_ALGORITHM_IDENTITY – Uses an identity function for mapping. This creates a one-to-one relationship between the hash value and the index.

  • DOCA_FLOW_PIPE_HASH_MAP_ALGORITHM_FLOODING – Duplicates the packet to match the number of entries, causing the packet to hit all entries in the pipe.

    • Best practices for this design are shown in the "Flow Hash Flooding Pipe" sample.

    • The pipe is limited to a maximum of 254 entries.

    • This algorithm is not supported in the VNF egress direction.

    • Packet order is only guaranteed for the destination of the first entry.

    • SECURE_EGRESS limitation (pre-ConnectX-8) – If a hash pipe is in the SECURE_EGRESS domain, its flooding destination pipe cannot be a SECURE_EGRESS or SECURE_INGRESS pipe.

  • DOCA_FLOW_PIPE_HASH_MAP_ALGORITHM_SELECT_ENABLED – Consistently directs packets to the same, single index within the pipe.

Dynamic Algorithm Selection

It is possible to specify the hash algorithm dynamically when forwarding a packet from another pipe.

When DOCA_FLOW_FWD_HASH_PIPE is used as a forwarding action, the application can specify which algorithm to apply. This allows a single hash pipe to be used in multiple ways, processing packets with different logic as needed.

Hardware Steering Mode

Users can enable hardware steering mode by setting devarg dv_flow_en to 2.

The following is an example of running DOCA with hardware steering mode:

Copy
Copied!
            

.... –a 03:00.0, dv_flow_en=2 –a 03:00.1, dv_flow_en=2....

The following is an example of running DOCA with software steering mode:

Copy
Copied!
            

.... –a 03:00.0 –a 03:00.1 ....

The dv_flow_en=2 means that hardware steering mode is enabled.

In the struct doca_flow_cfg, setting mode_args using (doca_flow_cfg_set_mode_args()) represents DOCA applications. If it is set with hws (e.g., "vnf,hws", "switch,hws", "remmote_vnf,hws") then hardware steering mode is enabled.

In switch mode, fdb_def_rule_en=0,vport_match=1,repr_matching_en=0,dv_xmeta_en=4 should be added to DPDK PMD devargs, which makes DOCA Flow switch module take over all the traffic.

To create an entry by calling doca_flow_pipe_add_entry, the parameter flags can be set as DOCA_FLOW_WAIT_FOR_BATCH or DOCA_FLOW_NO_WAIT:

  • DOCA_FLOW_WAIT_FOR_BATCH means that this flow entry waits to be pushed to hardware. Batch flows then can be pushed only at once. This reduces the push times and enhances the insertion rate.

  • DOCA_FLOW_NO_WAIT means that the flow entry is pushed to hardware immediately.

The parameter usr_ctx is handled in the callback set in struct doca_flow_cfg.

doca_flow_entries_process processes all the flows in this queue. After the flow is handled and the status is returned, the callback is executed with the status and usr_ctx.

If the user does not set the callback in doca_flow_cfg, the user can get the status using doca_flow_entry_get_status to check if the flow has completed offloading or not.

Isolated Mode

By default, DOCA Flow operates in non-isolated mode, where the DOCA application can process any received packets and send packets to the NIC without kernel involvement. This means that packets like ICMP ping are not received by the kernel, resulting in no replies when pinging a host running a DOCA application.

In isolated mode, packets matching root pipe entries are processed by the DOCA application, while other packets are handled directly by the kernel. To use isolated mode effectively, it's recommended to configure root pipe entries with matches followed by a forward operation to a next pipe, where additional matches, actions, monitoring, and forwarding can be specified. Unmatched packets are then received and sent by the kernel.

To enable isolated mode, update the mode_args string in struct doca_flow_cfg using doca_flow_cfg_set_mode_args(), adding "isolated" to the existing mode arguments (separated by commas). For example:

  • doca_flow_cfg_set_mode_args(cfg, "vnf,hws,isolated")

  • doca_flow_cfg_set_mode_args(cfg, "switch,isolated")

Pipe Resize

The move to HWS improves performance because rule insertion is implemented in hardware rather than software. However, this move imposes additional limitations, such as the need to commit in advance on the size of the pipes (the number of rule entries). For applications that require pipe sizes to grow over time, a static size can be challenging: Committing to a pipe size too small can cause the application to fail once the number of rule entries exceeds the committed number, and pre-committing to an excessively high number of rules can result in memory over-allocation.

This is where pipe resizing comes in handy. This feature allows the pipe size to increase during runtime with support for all entries in a new resized pipe.

Info

Pipe resizing is supported in a basic pipe and a control pipe.

Increasing Pipe Size

It is possible to set a congestion level by percentage (CONGESTION_PERCENTAGE). Once the number of entries in the pipe exceeds this value, a callback is invoked. For example, for a pipe with 1000 entries and a CONGESTION_PERCENTAGE of 80%, the CONGESTION_REACHED callback is invoked after the 800th entry is added.

Following the CONGESTION_REACHED callback, the application should call the pipe resize API (resize()). The following are optional callbacks during the resize callback:

  • A callback on the new number of entries allocated to the pipe

  • A callback on each entry that existed in the smaller pipe and is now allocated to the resized pipe

Info

The pipe pointer remains the same for the application to use even after being resized.

Upon completion of the internal transfer of all entries from the small pipe to the resized pipe, a RESIZED callback is invoked.

A CONGESTION_REACHED callback is received exactly once before the RESIZED callback. Receiving another CONGESTION_REACHED only happens after calling resize() and receiving its completion with a RESIZED callback.

List of Callbacks

  • CONGESTION_REACHED – on the updated number of entries in the pipe (if pipe is resizable)

    Info

    Receiving a CONGESTION_REACHED callback can occur after adding a small number of entries and for moving entries from a small to resized pipe. The application must always call pipe resize after receiving the CONGESTION_REACHED callback to handle such cases.

  • RESIZED – upon completion of the resize operation

    Note

    Calling pipe resize returns immediately. It starts an internal process that ends later with the RESIZED callback.

  • NR_ENTRIES_CHANGED (optional) – on the new max number of entries in the pipe

  • ENTRY_RELOCATE (optional) – on each entry moved from the small pipe to the resized pipe

Order of Operations for Pipe Resizing

  1. Set a process callback on flow configuration:

    Copy
    Copied!
                

    struct doca_flow_cfg *flow_cfg; doca_flow_cfg_create(&flow_cfg); doca_flow_cfg_set_cb_pipe_process(flow_cfg, <pipe-process-callback>);

    Info

    This informs on OP_CONGESTION_REACHED and OP_RESIZED operations when applicable.

  2. Set the following pipe attribute configurations:

    Copy
    Copied!
                

    struct doca_flow_pipe_cfg *pipe_cfg; doca_flow_pipe_cfg_create(&pipe_cfg, port); doca_flow_pipe_cfg_set_nr_entries(pipe_cfg, <initial-number-of-entries>); doca_flow_pipe_cfg_set_is_resizable(pipe_cfg, true); doca_flow_pipe_cfg_set_congestion_level_threshold(pipe_cfg, <CONGESTION_PERCENTAGE>); doca_flow_pipe_cfg_set_user_ctx(pipe_cfg, <pipe-user-context>);   # Optional for each pipe queue number to be excluded from entries process: doca_flow_pipe_cfg_set_excluded_queue(pipe_cfg, pipe_queue);

  3. Start adding entries:

    Copy
    Copied!
                

    /* Basic pipe */ doca_flow_pipe_add_entry() /* Control pipe */ doca_flow_pipe_control_add_entry()

  4. Once the number of entries in the pipe crosses the congestion threshold, an OP_CONGESTION_REACHED operation callback is received.

  5. Mark the pipe's congestion threshold event and, upon return, call doca_flow_pipe_resize(). For this call, add the following parameters:

    • The new threshold percentage for calculating the new size.

    • A callback on the new pipe size (optional):

      Copy
      Copied!
                  

      doca_flow_pipe_resize_nr_entries_changed_cb nr_entries_changed_cb

    • A callback on the entries to be transferred to the resized pipe:

      Copy
      Copied!
                  

      doca_flow_pipe_resize_entry_relocate_cb entry_relocation_cb

  6. Call doca_flow_entries_process() per queue in a loop to trigger the transfer of entries (per queue). It is relevant for both a basic pipe and a control pipe.

  7. At this phase, adding new entries to the pipe is permitted. The entries are added directly to the resized pipe and therefore do not need to be transferred.

  8. Once all entries are transferred on all queues, an OP_RESIZED operation callback is received (in the context of the last thread calling entries process. Also, at this point a new OP_CONGESTION_REACHED operation callback can be received again.

At this point calling doca_flow_entries_process() can be stopped for a control pipe. For a basic pipe an additional call is required to complete the call to doca_flow_pipe_add_entry().

Info

doca_flow_entries_process() has the following roles:

  • Triggering entry transfer from the smaller to the bigger pipe (until an OP_RESIZED callback is received)

  • Follow up API on previous add_entries API (basic pipe relevance only)

  • If a specific queue is excluded on the resized pipe - calling entries process on that queue is not a must in order to receive the OP_RESIZED operation callback.

  • The OP_RESIZED callback can only be received on a pipe non-excluded queue.

Excluded Queues

There are cases where an application uses a subset of queues for rules operations such as add/remove/update. On the other hand, resizing a pipe requires calling doca_flow_entries_process() on all queues to get anOP_RESIZED operation callback. This is where the excluded queues feature becomes handy: it enables excluding the pipe's queues where rule operations do not take part of.

  • When calling pipe resize it is sufficient to process entries only on the non-excluded queues to get an OP_RESIZED operation callback for that pipe. Still - it is allowed to process entries of any queue (for any other pipe with this queue included).

  • Before and during pipe destroy it is allowed to process entries (in parallel) on all pipe's excluded queues.

However, there are limitations when configuring a pipe's excluded queues:

  • It is not allowed to add/remove/update entries on a pipe's excluded queues. There is no validity check or enforcement on this and it is up to the application to maintain this restriction. It is allowed to process entries of all queues at all times since doca_flow_entreis_process() API is not per pipe.

Excluded queue syntax

Copy
Copied!
            

struct doca_flow_pipe_cfg *pipe_cfg;   doca_flow_pipe_cfg_create(&pipe_cfg, port); ... /* The next configuration should be repeated for each excluded queue number individually */ doca_error_t doca_flow_pipe_cfg_set_excluded_queue(struct doca_flow_pipe_cfg *pipe_cfg, uint16_t pipe_queue);   # pipe_cfg - pointer to DOCA Flow pipe configuration struct. # pipe_queue - the pipe queue number to exclude. # Return code: # - DOCA_SUCCESS - in case of success. # - DOCA_ERROR_INVALID_VALUE - in case of invalid input.

Warning
  • An excluded queue number must be between 0 to (max queues - 1).

  • It is not allowed to exclude all queues of a pipe. At least one of the pipe's queues must remain non-excluded.

Hairpin Configuration

In switch mode, if dev is set in struct doca_flow_port_cfg (using doca_flow_port_cfg_set_dev()), then an internal hairpin is created for direct wire-to-wire fwd. Users may specify the hairpin configuration using mode_args. The supported options as follows:

  • hairpinq_num=[n] – the hairpin queue number

  • use_huge_mem – determines whether the Tx buffer uses hugepage memory

  • lock_rx_mem – locks Rx queue memory

Teardown

Pipe Entry Teardown

When an entry is terminated by the user application or ages-out, the user should call the entry destroy function, doca_flow_pipe_rm_entry(). This frees the pipe entry and cancels hardware offload.

Pipe Teardown

When a pipe is terminated by the user application, the user should call the pipe destroy function, doca_flow_pipe_destroy(). This destroys the pipe and the pipe entries that match it.

When all pipes of a port are terminated by the user application, the user should call the pipe flush function, doca_flow_port_pipes_flush(). This destroys all pipes and all pipe entries belonging to this port.

Warning

During doca_flow_pipe_destroy() execution, the application must avoid adding/removing entries or checking for aged entries of any other pipes.


Port Teardown

When the port is not used anymore, the user should call the port stop function, doca_flow_port_stop(). This stops the DOCA port, disables the traffic, destroys the port and frees all resources of the port.

Flow Teardown

When the DOCA Flow is not used anymore, the user should call the flow destroy function, doca_flow_destroy(). This releases all the resources used by DOCA Flow.

Info

In DOCA Flow, a scratch area of size up to DOCA_FLOW_META_MAX bytes is available throughout the pipeline. This area functions like transient memory, allowing applications to store, modify, and match metadata values across multiple pipes.

You can set metadata values manually or copy them from packet fields. These values can then be matched in later pipes, with support for bit masking in both matching and modification operations.

Metadata Actions

Metadata can be modified using the following action types depending on their masks or descriptors:

  • SET – Sets a metadata value from a pipe action or a specific entry action. The width of the value is defined by the action descriptor.

  • COPY – Copies a value from a packet field, which can include other metadata fields. The width is also specified by the descriptor.

Info

Refer to DOCA API References for details on struct doca_flow_meta.


Behavior and Usage Considerations

Metadata fields behave similarly to variables in programming languages but have specific characteristics within DOCA Flow:

  • Some fields may be internally used by DOCA Flow actions and can be overwritten with opaque values

  • Most fields do not retain their values when the packet crosses between NIC and FDB domains ( i.e., if there is a transfer between two pipes belonging to the NIC and FDB domains the field data might be lost)

  • Some fields support software datapath integration, allowing promotion to or from the software Rx/Tx paths.

Metadata Field Breakdown

Field

Description

pkt_meta

Intended exclusively for application use. Preserves its content across NIC/FDB domain boundaries. Available in the software Rx datapath. Values set in Tx can be observed in Rx.

u32[0-3]

Always available for application use. Content may not be preserved across NIC/FDB domain boundaries. May be used internally by DOCA Flow for certain pipe types or actions.

u32[4]

Always available for application use

u32[5-max]

Optionally available, depending on the hardware model, firmware configuration, and DOCA Flow operation mode


Pipe Design Consideration

Some pipe types or actions use specific bytes in the scratch area for internal operations. So, if:

  1. PIPE-1 sets metadata

  2. PIPE-A (intermediate) modifies the same bytes internally

  3. PIPE-2 reads those bytes

Then the values may be overwritten by PIPE-A, leading to incorrect results in PIPE-2.

Tip

When designing a pipe tree, ensure metadata fields used across multiple pipes are not also internally consumed by intermediate pipes.

Some DOCA pipe types (or actions) use several bytes in the scratch area for internal usage. So, if the user has set these bytes in PIPE-1 and read them in PIPE-2, and between PIPE-1 and PIPE-2 there is PIPE-A which also uses these bytes for internal purposes, then these bytes are overwritten by the PIPE-A. This must be considered when designing the pipe tree.

Scratch Area Usage by Pipe Type

The following table shows which scratch area bytes (metadata fields) are used internally by different DOCA Flow pipe types.

If the application writes to any of these fields before or after they are used internally by a pipe, the values may be overwritten.

Pipe Type/Action

Elements Used in Scratch

Hash Pipe

u32[3] bits [31:16]

LPM Pipe

u32[0]

LPM Exact Match Pipe

u32[0], u32[1]

ACL

u32[0]u32[3]

IPsec Anti-Replay

u32[0] bits [7:0], u32[1]

METER COLOR

u32[1] bits [7:0]

NAT64

u32[0]u32[2]

Note

When designing the pipeline, avoid reading or writing to metadata fields that are used internally by intermediate pipes.


Supported Metadata Fields per Hardware and Mode

The number of available user metadata fields (starting from u32[5]) depends on the hardware generation and operating mode. The following table details which fields are available for application use.

Note

Attempting to use an unavailable metadata field during pipe creation will cause the operation to fail.

Hardware Type and Mode

Available Fields

Before ConnectX-7, VNF mode, over PF (switchdev mode)

u32[5]

Before ConnectX-7, VNF mode, over VF / PF (non-switchdev)

u32[5]u32[6]

Before ConnectX-7, SWITCH mode

None

ConnectX-7 and above, VNF mode, over PF (switchdev mode)

u32[5]u32[8]

ConnectX-7 and above, VNF mode, over VF / PF (non-switchdev)

u32[5]u32[9]

ConnectX-7 and above, SWITCH mode

u32[5]u32[7]

Tip

The number of available fields listed above is the maximum. This number can be reduced by specific host configurations, firmware settings, or other enabled features.

Always verify hardware capabilities and operating mode when designing pipelines that rely on these extended metadata fields. An API to query these capabilities will be provided in a future release.


In situations where there is a port without a pipe defined, or with a pipe defined but without any entry, the default behavior is that all packets arrive to a port in the software.

packet-processing-no-flow-version-1-modificationdate-1761115072853-api-v2.png

Once entries are added to the pipe, if a packet has no match then it continues to the port in the software. If it is matched, then the rules defined in the pipe are executed.

packet-processing-w-flow-version-1-modificationdate-1761115072617-api-v2.png

If the packet is forwarded in RSS, the packet is forwarded to software according to the RSS definition. If the packet is forwarded to a port, the packet is redirected back to the wire. If the packet is forwarded to the next pipe, then the software attempts to match it with the next pipe.

Note that the number of pipes impacts performance. The longer the number of matches and actions that the packet goes through, the longer it takes the hardware to process it. When there is a very large number of entries, the hardware must access the main memory to retrieve the entry context which increases latency.

DOCA Flow supports trace and debugging of DOCA Flow applications which enable collecting predefined internal key performance indicators (KPIs) and pipeline visualization.

Installation

The set of DOCA's SDK development packages include also a developer-oriented package that includes additional trace and debug features which are not included in the production libraries:

  • .deb based systems – libdoca-sdk-flow-trace

  • .rpm based systems – doca-sdk-flow-trace

These packages install the trace-version of the libraries under the following directories:

  • .deb based systems – /opt/mellanox/doca/lib/<arch>/trace

  • .rpm based systems – /opt/mellanox/doca/lib64/trace

Using Trace Libraries

Runtime Linking

The trace libraries are designed to allow a user to link their existing (production) program to the trace library without needing to recompile the program. To do so, one should simply update the matching environment variable so that the OS will prioritize loading libraries from the above trace directory.

The following is an example for such an update for the Ubuntu 22.04 BlueField image:

Copy
Copied!
            

LD_LIBRARY_PATH=/opt/mellanox/doca/lib/aarch64-linux-gnu/trace:${LD_LIBRARY_PATH} doca_ipsec_security_gw <program parameters>


Compilation

The trace-level development packages in the previous section provide additional compilation definitions (doca-flow-trace.pc) to be used in addition to the regular compilation definitions for the DOCA Flow SDK library (doca-flow.pc). It is recommended to use these compilation definitions for the following scenarios:

  • Static linking of the trace-level DOCA Flow library into your program

  • Regular (dynamic) linking of the trace-level DOCA Flow library into your program during development and testing

Although the latter could also be determined at runtime as explained in the previous section, many developers find it handy to compile directly against the trace version during initial development phases.

Trace Features

DOCA Log – Trace Level

DOCA's trace logging level (DOCA_LOG_LEVEL_TRACE) is compiled as part of this trace version of the library. That is, any program compiled against the library can activate this additional logging level through DOCA's API or even through DOCA's built-in argument parsing (ARGP) library:

Copy
Copied!
            

LD_LIBRARY_PATH=/opt/mellanox/doca/lib/aarch64-linux-gnu/trace:${LD_LIBRARY_PATH} doca_ipsec_security_gw <program parameters> --sdk-log-level 70


DOCA Flow – Additional Sanity Checks

When using the trace version of the library, additional input sanitation checks are added, at the cost of introducing minor performance implications. These checks are meant to assist developers in their early steps of using the library, as they provide early detection and improved logging for common coding mistakes.

This section provides DOCA Flow sample implementation on top of the BlueField.

Info

All the DOCA samples described in this section are governed under the BSD-3 software license agreement.

Sample Prerequisites

A DOCA Flow-based program can either run on the host machine or on the BlueField.

Flow-based programs require an allocation of huge pages, hence the following commands are required:

Copy
Copied!
            

$ echo '1024' | sudo tee -a /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages $ sudo mkdir /mnt/huge $ sudo mount -t hugetlbfs -o pagesize=2M nodev /mnt/huge


Running the Sample

  1. Refer to the following documents:

  2. To build a given sample, run the following command. If you downloaded the sample from GitHub, update the path in the first line to reflect the location of the sample file:

    Copy
    Copied!
                

    cd /opt/mellanox/doca/samples/doca_flow/<sample_name> meson /tmp/build ninja -C /tmp/build

    Note

    The binary doca_<sample_name> will be created under /tmp/build/.

  3. Sample (e.g., flow_aging) usage:

    Copy
    Copied!
                

    Usage: doca_flow_aging [DPDK Flags] –- [DOCA Flags]   DOCA Flags: -h, --help Print a help synopsis -v, --version Print program version information -l, --log-level Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> --sdk-log-level Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> -j, --json <path>                       Parse command line flags from an input json file

  4. For additional information per sample, use the -h option after the -- separator:

    Copy
    Copied!
                

    /tmp/build/doca_<sample_name> -- -h

  5. The following is an example from an execution on the DPU:

    • CLI example for running the samples with "vnf" mode:

      Copy
      Copied!
                  

      /tmp/build/doca_<sample_name> -- -a aux/2 -a aux/3 -l 60

    • CLI example for running the VNF samples with vnf,hws mode:

      Copy
      Copied!
                  

      /tmp/build/doca_<sample_name> -- -a aux/2,dv_flow_en=2 -a aux/3,dv_flow_en=2 -l 60

    • CLI example for running the switch samples with switch,hws mode:

      Copy
      Copied!
                  

      /tmp/build/doca_<sample_name> -- -r pci/03:00.0,pf0sf[2-3] -l 60

      Note

      When running on the BlueField with switch,hws mode , it is not necessary to configure the OVS.

      DOCA switch sample hides the extra fdb_def_rule_en=0,vport_match=1,repr_matching_en=0,dv_xmeta_en=4 DPDK devargs with a simple -r to specify the PCIe ID and representor information.

      Note

      When running on the DPU using the command above, sub-functions must be enabled according to the BlueField Scalable Function User Guide.

      Note

      When running on the host, virtual functions must be used according to the instructions in the DOCA Virtual Functions User Guide.

      Note

      For additional information regarding the command line syntax of the device and device representor identifiers, please referr to the matching section in the DOCA Arg Parser Guide.

Field String Supported Actions

The following is a list of all the API fields available for matching criteria and action execution.

String Field

Path in The Structure

Set

Add

Copy

Condition

Match

Actions

Dst

Src

Dst

Src

A

B

meta.data (bit_offset < 32)

meta.pkt_meta

meta.pkt_meta

meta.data (bit_offset ≥ 32)

meta.u32[i]

meta.u32[i]

parser_meta.hash.result

None. See section "Copy Hash Result" for details.

N/A

N/A

N/A

parser_meta.port.id

parser_meta.port_id

N/A

N/A

N/A

parser_meta.ipsec.syndrome

parser_meta.ipsec_syndrome

N/A

N/A

N/A

parser_meta.psp.syndrome

parser_meta.psp_syndrome

N/A

N/A

N/A

parser_meta.random.value

parser_meta.random

N/A

N/A

N/A

parser_meta.meter.color

parser_meta.meter_color

N/A

N/A

N/A

parser_meta.packet_type.l2_outer

parser_meta.outer_l2_type

N/A

N/A

N/A

N/A

N/A

N/A

N/A

parser_meta.packet_type.l3_outer

parser_meta.outer_l3_type

N/A

N/A

N/A

N/A

N/A

N/A

N/A

parser_meta.packet_type.l4_outer

parser_meta.outer_l4_type

N/A

N/A

N/A

N/A

N/A

N/A

N/A

parser_meta.packet_type.l2_inner

parser_meta.inner_l2_type

N/A

N/A

N/A

N/A

N/A

N/A

N/A

parser_meta.packet_type.l3_inner

parser_meta.inner_l3_type

N/A

N/A

N/A

N/A

N/A

N/A

N/A

parser_meta.packet_type.l4_inner

parser_meta.inner_l4_type

N/A

N/A

N/A

N/A

N/A

N/A

N/A

parser_meta.outer_ip_fragmented.flag

parser_meta.outer_ip_fragmented

N/A

N/A

N/A

N/A

N/A

N/A

N/A

parser_meta.inner_ip_fragmented.flag

parser_meta.inner_ip_fragmented

N/A

N/A

N/A

N/A

N/A

N/A

N/A

parser_meta.outer_integrity.l3_ok

parser_meta.outer_l3_ok

N/A

N/A

N/A

N/A

N/A

N/A

N/A

parser_meta.outer_integrity.ipv4_checksum_ok

parser_meta.outer_ip4_checksum_ok

N/A

N/A

N/A

N/A

N/A

N/A

N/A

parser_meta.outer_integrity.l4_ok

parser_meta.outer_l4_ok

N/A

N/A

N/A

N/A

N/A

N/A

N/A

parser_meta.outer_integrity.l4_checksum_ok

parser_meta.outer_l4_checksum_ok

N/A

N/A

N/A

N/A

N/A

N/A

N/A

parser_meta.inner_integrity.l3_ok

parser_meta.inner_l3_ok

N/A

N/A

N/A

N/A

N/A

N/A

N/A

parser_meta.inner_integrity.ipv4_checksum_ok

parser_meta.inner_ip4_checksum_ok

N/A

N/A

N/A

N/A

N/A

N/A

N/A

parser_meta.inner_integrity.l4_ok

parser_meta.inner_l4_ok

N/A

N/A

N/A

N/A

N/A

N/A

N/A

parser_meta.inner_integrity.l4_checksum_ok

parser_meta.inner_l4_checksum_ok

N/A

N/A

N/A

N/A

N/A

N/A

N/A

outer.eth.dst_mac

outer.eth.dst_mac

outer.eth.dst_mac

outer.eth.src_mac

outer.eth.src_mac

outer.eth.src_mac

outer.eth.type

outer.eth.type

outer.eth.type

outer.eth_vlan0.tci

outer.eth_vlan[0].tci

outer.eth_vlan[0].tci

outer.eth_vlan1.tci

outer.eth_vlan[1].tci

outer.eth_vlan[1].tci

outer.ipv4.src_ip

outer.ip4.src_ip

outer.ip4.src_ip

outer.ipv4.dst_ip

outer.ip4.dst_ip

outer.ip4.dst_ip

outer.ipv4.dscp_ecn

outer.ip4.dscp_ecn

outer.ip4.dscp_ecn

outer.ipv4.next_proto

outer.ip4.next_proto

outer.ip4.next_proto

outer.ipv4.ttl

outer.ip4.ttl

outer.ip4.ttl

outer.ipv4.version_ihl

outer.ip4.version_ihl

outer.ip4.version_ihl

outer.ipv4.total_len

outer.ip4.total_len

outer.ip4.total_len

outer.ipv4.identification

outer.ip4.identification

outer.ip4.identification

outer.ipv4.flags_fragment_offset

outer.ip4.flags_fragment_offset

outer.ip4.flags_fragment_offset

outer.ipv6.src_ip

outer.ip6.src_ip

outer.ip6.src_ip

outer.ipv6.dst_ip

outer.ip6.dst_ip

outer.ip6.dst_ip

outer.ipv6.traffic_class

outer.ip6.traffic_class

outer.ip6.traffic_class

outer.ipv6.flow_label

outer.ip6.flow_label

outer.ip6.flow_label

outer.ipv6.next_proto

outer.ip6.next_proto

outer.ip6.next_proto

outer.ipv6.hop_limit

outer.ip6.hop_limit

outer.ip6.hop_limit

outer.ipv6.payload_len

outer.ip6.payload_len

outer.ip6.payload_len

outer.udp.src_port

outer.udp.l4_port.src_port

outer.udp.l4_port.src_port

outer.udp.dst_port

outer.udp.l4_port.dst_port

outer.udp.l4_port.dst_port

outer.transport.src_port

outer.transport.src_port

outer.transport.src_port

outer.transport.dst_port

outer.transport.dst_port

outer.transport.dst_port

outer.tcp.src_port

outer.tcp.l4_port.src_port

outer.tcp.l4_port.src_port

outer.tcp.dst_port

outer.tcp.l4_port.dst_port

outer.tcp.l4_port.dst_port

outer.tcp.flags

outer.tcp.flags

outer.tcp.flags

outer.tcp.data_offset

outer.tcp.data_offset

outer.tcp.data_offset

outer.tcp.seq_num

outer.tcp.seq_num

outer.tcp.seq_num

outer.tcp.ack_num

outer.tcp.ack_num

outer.tcp.ack_num

outer.icmp4.type

outer.icmp.type

outer.icmp.type

outer.icmp4.code

outer.icmp.code

outer.icmp.code

outer.icmp4.ident

outer.icmp.ident

outer.icmp.ident

outer.icmp6.type

outer.icmp.type

outer.icmp.type

outer.icmp6.code

outer.icmp.code

outer.icmp.code

tunnel.gre.protocol

tun.protocol

tun.protocol

tunnel.gre_key.value

tun.gre_key

tun.gre_key

tunnel.nvgre.protocol

tun.protocol

tun.protocol

tunnel.nvgre.nvgre_vs_id

tun.nvgre_vs_id

tun.nvgre_vs_id

tunnel.nvgre.nvgre_flow_id

tun.nvgre_flow_id

tun.nvgre_flow_id

tunnel.vxlan.vni

tun.vxlan_tun_id

tun.vxlan_tun_id

tunnel.vxlan_gpe.vni

tun.vxlan_tun_id

tun.vxlan_tun_id

tunnel.vxlan_gbp.vni

tun.vxlan_tun_id

tun.vxlan_tun_id

tunnel.vxlan_gpe.next_proto

tun.vxlan_gpe_next_protocol

tunnel.vxlan_gpe.flags

tun.vxlan_gpe_flags 1

tunnel.vxlan_gbp.policy_id

tun.vxlan_gbp_group_policy_id

tunnel.vxlan.rsvd1

tun.vxlan_tun_rsvd1 2

tunnel.vxlan_gpe.rsvd1

tun.vxlan_tun_rsvd1 2

tunnel.vxlan_gbp.rsvd1

tun.vxlan_tun_rsvd1 2

tunnel.gtp.teid

tun.gtp_teid

tun.gtp_teid

tunnel.gtp_ext_hdr.next_ext

tun.gtp_next_ext_hdr_type

tun.gtp_next_ext_hdr_type

N/A

N/A

N/A

N/A

N/A

N/A

N/A

tunnel.gtp_psc.qfi

tun.gtp_ext_psc_qfi

tun.gtp_ext_psc_qfi

tunnel.esp.spi

tun.esp_spi

tun.esp_spi

tunnel.esp.sn

tun.esp_sn

tun.esp_sn

tunnel.psp.nexthdr

tun.psp.nexthdr

tun.psp.nexthdr

tunnel.psp.hdrextlen

tun.psp.hdrextlen

tun.psp.hdrextlen

tunnel.psp.res_cryptofst

tun.psp.res_cryptofst

tun.psp.res_cryptofst

tunnel.psp.s_d_ver_v

tun.psp.s_d_ver_v

tun.psp.s_d_ver_v

tunnel.psp.spi

tun.psp.spi

tun.psp.spi

tunnel.psp.iv

tun.psp.iv

tun.psp.iv

tunnel.psp.vc

tun.psp.vc

tun.psp.vc

tunnel.mpls[0].label

tun.mpls[0].label

tun.mpls[0].label

tunnel.mpls[1].label

tun.mpls[1].label

tun.mpls[1].label

tunnel.mpls[2].label

tun.mpls[2].label

tun.mpls[2].label

tunnel.mpls[3].label

tun.mpls[3].label

tun.mpls[3].label

tunnel.mpls[4].label

tun.mpls[4].label

tun.mpls[4].label

tunnel.geneve.ver_opt_len

tun.geneve.ver_opt_len

tun.geneve.ver_opt_len

tunnel.geneve.o_c

tun.geneve.o_c

tun.geneve.o_c

tunnel.geneve.next_proto

tun.geneve.next_proto

tun.geneve.next_proto

tunnel.geneve.vni

tun.geneve.vni

tun.geneve.vni

tunnel.geneve_opt[i].type

None. See section "Copy Geneve Options" for details.

tunnel.geneve_opt[i].class

tunnel.geneve_opt[i].data

inner.eth.dst_mac

inner.eth.dst_mac

inner.eth.src_mac

inner.eth.src_mac

inner.eth.type

inner.eth.type

inner.eth_vlan0.tci

inner.eth_vlan[0].tci

inner.eth_vlan1.tci

inner.eth_vlan[1].tci

inner.ipv4.src_ip

inner.ip4.src_ip

inner.ipv4.dst_ip

inner.ip4.dst_ip

inner.ipv4.dscp_ecn

inner.ip4.dscp_ecn

inner.ipv4.next_proto

inner.ip4.next_proto

inner.ipv4.ttl

inner.ip4.ttl

inner.ipv4.version_ihl

inner.ip4.version_ihl

inner.ipv4.total_len

inner.ip4.total_len

inner.ipv4.identification

inner.ip4.identification

inner.ipv4.flags_fragment_offset

inner.ip4.flags_fragment_offset

inner.ipv6.src_ip

inner.ip6.src_ip

inner.ipv6.dst_ip

inner.ip6.dst_ip

inner.ipv6.traffic_class

inner.ip6.traffic_class

inner.ipv6.flow_label

inner.ip6.flow_label

inner.ipv6.next_proto

inner.ip6.next_proto

inner.ipv6.hop_limit

inner.ip6.hop_limit

inner.ipv6.payload_len

inner.ip6.payload_len

inner.udp.src_port

inner.udp.l4_port.src_port

inner.udp.dst_port

inner.udp.l4_port.dst_port

inner.transport.src_port

inner.transport.src_port

inner.transport.dst_port

inner.transport.dst_port

inner.tcp.src_port

inner.tcp.l4_port.src_port

inner.tcp.dst_port

inner.tcp.l4_port.dst_port

inner.tcp.flags

inner.tcp.flags

inner.tcp.data_offset

inner.tcp.data_offset

inner.tcp.seq_num

inner.tcp.seq_num

inner.tcp.ack_num

inner.tcp.ack_num

inner.icmp4.type

inner.icmp.type

inner.icmp4.code

inner.icmp.code

inner.icmp4.ident

inner.icmp.ident

inner.icmp6.type

inner.icmp.type

inner.icmp6.code

inner.icmp.code

  1. This is a mandatory field for VXLAN-GPE matching.

  2. tun.vxlan_tun_rsvd1 modifications only work for traffic with the default UDP destination port (i.e., 4789 for VXLAN and VXLAN-GBP and 4790 for VXLAN-GPE)    

Non-Matchable Field Strings

Users can modify fields which are not included in doca_flow_match structure.

Copy Hash Result

Users can copy the the matcher hash calculation into other fields using the "parser_meta.hash.result" string.

Copy GENEVE Options

User can copy GENEVE option type/class/data using the following strings:

  • "tunnel.geneve_opt[i].type" – Copy from/to option type (only for option configured with DOCA_FLOW_PARSER_GENEVE_OPT_MODE_MATCHABLE).

  • "tunnel.geneve_opt[i].class" – Copy from/to option class (only for option configured with DOCA_FLOW_PARSER_GENEVE_OPT_MODE_MATCHABLE).

  • "tunnel.geneve_opt[i].data" – Copy from/to option data, the bit offset is from the start of the data.

i is the index of the option in tlv_list array provided in doca_flow_parser_geneve_opt_create

© Copyright 2025, NVIDIA. Last updated on Nov 20, 2025