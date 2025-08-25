Before using any DOCA Flow function, it is mandatory to call DOCA Flow initialization, doca_flow_init() , which initializes all resources required by DOCA Flow.

This mode ( mode_args ) defines the basic traffic in DOCA. It creates some miss rules when a DOCA port initializes. Currently, DOCA supports 3 modes:

vnf A packet arriving from one of the device's ports is processed, and can be sent to another port. By default, missed packets go to RSS. The following diagram shows the basic traffic flow in vnf mode. Packet1 firstly misses and is forwarded to host RSS. The app captures this packet and decides how to process it and then creates a pipe entry. Packet2 will hit this pipe entry and do the action, for example, for VXLAN, will do decap, modify, and encap, then is sent out from P1.

switch Used for internal switching, only representor ports are allowed, for example, uplink representors and SF/VF representors. Packet is forwarded from one port to another. If a packet arrives from an uplink and does not hit the rules defined by the user's pipe, then the packet is received on all RSS queues of the representor of the uplink. The following diagram shows the basic flow of traffic in switch mode. Packet1 firstly misses to host RSS queues. The app captures this packet and decides to which representor the packet goes, and then sets the rule. Packets hit this rule and go to representor0. doca_dev field is mandatory in doca_flow_port_cfg (using doca_flow_port_cfg_set_dev() ) and isolated mode should be specified. Note The application must avoid initialization of the VF/SF representor ports in DPDK API (i.e., the following functions rte_eth_dev_configure() , rte_eth_rx_queue_setup() , rte_eth_dev_start() must not be called for VF/SF representor ports). DOCA Flow switch mode unifies all the ports to the switch manager port for traffic management. This means that all the traffic is handled by switch manager port. Users only must create an RSS pipe on the switch manager port to get the missed traffic, and they should only manage the pipes on the switch manager port. Switch mode can work with two different mode_args configurations: With or without expert . The way to retrieve the miss traffic source's port_id depends on this configuration: Note In switch mode, the default fwd_miss of the pipe (once fwd_miss is not configured by the user) is as follows: Forward to kernel in isolated mode Forward to the port RSS in non-isolated mode If expert is not set, the traffic misses to software would be tagged with port_id information in the mbuf CQE field to allow users to deduce the source port_id . Meanwhile, users can set the destination port_id to mbuf meta and the packet is sent out directly to the destination port based on the meta information. Info Please refer to the "Flow Switch to Wire" sample to get more information regarding the port_id management with missed traffic mbuf. If expert is set, the port_id is not added to the packet. Users can configure the pipes freely to implement their own solution. Note Traffic cloned from the VF to the RSS pipe misses its port_id information due to firmware limitation.

remote-vnf Remote mode is a BlueField mode only, with two physical ports (uplinks). Users must use doca_flow_port_pair to pair one physical port and one of its representors. A packet from this uplink, if it does not hit any rules from the users, is firstly received on this representor. Users must also use doca_flow_port_pair to pair two physical uplinks. If a packet is received from one uplink and hits the rule whose FWD action is to another uplink, then the packets are sent out from it. The following diagram shows the basic traffic flow in remote-vnf mode. Packet1, from BlueField uplink P0, firstly misses to host VF0. The app captures this packet and decides whether to drop it or forward it to another uplink (P1). Then, using gRPC to set rules on P0, packet2 hits the rule, then is either dropped or is sent out from P1.

DOCA Flow API serves as an abstraction layer API for network acceleration. The packet processing in-network function is described from ingress to egress and, therefore, a pipe must be attached to the origin port. Once a packet arrives to the ingress port, it starts the hardware execution as defined by the DOCA API.

doca_flow_port is an opaque object, as the DOCA Flow API is designed to be independent of any specific packet I/O framework (e.g., DPDK).

To begin using a DOCA Flow port, the user must call doca_flow_port_start() . This function associates the application's underlying port with a DOCA Flow port.

During this setup, the user must assign a logical port ID using the doca_flow_port_cfg_set_port_id() API.

This port_id is later referenced in:

doca_flow_parser_meta.port_id - when matching packets received on a specific port

doca_flow_fwd.port_id - when forwarding packets to a specific port

For DPDK-based applications, the DPDK port_id can be used directly as the logical port identifier.

A DOCA device must be specified during doca_flow_port_start() to provide the necessary hardware context for the port.

For physical , virtual , or subfunction devices (PF, VF, or SF), use the doca_flow_port_cfg_set_dev() API to associate a doca_dev object with the port.

For representor ports, use the doca_flow_port_cfg_set_dev_rep() API to associate a doca_dev_rep object instead.

Providing a doca_dev_rep is what explicitly indicates that the port is a representor.

The use of DOCA Flow actions mandates that sufficient memory will be allocated for them, in advance, per port. This is done using the doca_flow_port_cfg_set_actions_mem_size() API function. The recommended way of getting precise information about the expected actions memory consumption is the following:

During development, allocate a fixed high amount of actions memory, regardless of your actual pipeline. Once the steering pipeline is ready, examine the program's actual consumption using the DOCA Flow Tune tool. Using DOCA Flow Tune for calibration, update the parameter passed to the setter function.

Please note that in the hot upgrade use case, the total actions memory size set for all instances should not exceed the maximum resource limit of this hardware port.

DOCA Flow ports can be initialized multiple times from different instances. Each instance prepares its pipeline, but only one actively receives port traffic at a time. The instance actively handling the port traffic depends on the operation state set by the doca_flow_port_cfg_set_operation_state() function:

DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE – The instance actively handles incoming and outgoing traffic

DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE_READY_TO_SWAP – The instance handles traffic actively when no other active instance is available

DOCA_FLOW_PORT_OPERATION_STATE_STANDBY – The instance handles traffic only when no active or active_ready_to_swap instance is available

DOCA_FLOW_PORT_OPERATION_STATE_UNCONNECTED – The instance does not handle traffic, regardless of the state of other instances

If the doca_flow_port_cfg_set_operation_state() function is not called, the default state DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE is applied.

Note When a port is configured with a state that expects to handle traffic, it takes effect only after root pipes are created for this port.

When the active port is closed, either gracefully or due to a crash, the standby instance automatically becomes active without any action required.

The port operation state can be modified after the port is started using the doca_flow_port_operation_state_modify() function.

This operation state mechanism allows upgrading the DOCA Flow program without losing any traffic.

To upgrade an existing DOCA Flow program with ports started in DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE state (Instance A):

Open a new Instance B and start its ports in DOCA_FLOW_PORT_OPERATION_STATE_STANDBY state. Modify Instance A's ports from DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE to DOCA_FLOW_PORT_OPERATION_STATE_UNCONNECTED state. At this point, Instance B starts receiving traffic. Close Instance A. Open a new Instance C with DOCA_FLOW_PORT_OPERATION_STATE_UNCONNECTED state. Instance C is the upgraded version of Instance A. Create the entire pipeline for Instance C. Change Instance C's state from DOCA_FLOW_PORT_OPERATION_STATE_UNCONNECTED to DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE . At this point, Instance B stops receiving traffic and Instance C starts. Instance B can either be closed or kept as a backup should Instance C crash.

This mechanism also facilitates swapping two different DOCA Flow programs without losing any traffic.

To swap between two existing DOCA Flow programs with ports started in DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE and DOCA_FLOW_PORT_OPERATION_STATE_STANDBY states (Instance A and Instance B, respectively):

Modify Instance A's ports from DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE to DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE_READY_TO_SWAP . Modify Instance B's ports from DOCA_FLOW_PORT_OPERATION_STATE_STANDBY to DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE . At this point, Instance B starts receiving traffic. Modify Instance A's ports from DOCA_FLOW_PORT_OPERATION_STATE_ACTIVE_READY_TO_SWAP to DOCA_FLOW_PORT_OPERATION_STATE_STANDBY .

Supported only in switch mode – the mode_args string must include "switch" .

Only the switch port supports states; its representors are affected by its state. Starting a representor port or calling the modify function with a non-active operation state should fail.

Two instances cannot be in the same operation state simultaneously, except for DOCA_FLOW_PORT_OPERATION_STATE_UNCONNECTED . If a new instance initializes a port while another instance with the same operation state exists, a DOCA_ERROR_ALREADY_EXIST error is returned either during port startup or root pipe creation.

Pipe is a template that defines packet processing without adding any specific hardware rule. A pipe consists of a template that includes the following elements:

Match

Monitor

Actions

Forward

The following diagram illustrates a pipe structure.

The creation phase allows the hardware to efficiently build the execution pipe. After the pipe is created, specific entries can be added. A subset of the pipe may be used (e.g., skipping the monitor completely, just using the counter, etc).

DOCA Flow allows defining criteria for matching on a packet or for taking actions on a matched packet by modifying it. The information defining these criteria is provided through the following pointers:

Match or action pointer – given at pipe or entry creation

Mask pointer – optionally given at pipe creation

Defining criteria for matching or actions on a packet can be done at the pipe level, where it applies to all packets of a pipe, or specified on a per entry basis, where each entry defines the operation on either the match, actions, or both.

In DOCA Flow terminology, when a field is identified as CHANGEABLE at pipe creation, this means that the actual criterion of the field is deferred to entry creation. Different entries can provide different criteria for a CHANGEABLE field.

A match or action field can be categorized, during pipe creation, as one of the following:

IGNORED – Ignored in either the match or action taking process

CHANGEABLE – When the actual behavior is deferred to the entry creation stage

SPECIFIC – Value is used as is in either match or action process

A mask field can either be provided, in which case it is called explicit matching, or action applying. If the mask pointer is NULL, we call it implicit matching or action applying. The following subsections provide the logic governing matching and action applying.

When a field value is specified as 0xffff it means that all the field's bits are set (e.g., for TTL it means 0xff and for IPv4 address it means 0xffffffff ).

Matching is the process of selecting packets based on their fields' values and steering them for further processing. Processing can either be further matching or actions applying.

The packet enters the green filter which modifies it by masking it with the value A. The output value, P&A, is then compared to the value B, and if they are equal, then that is a match.

The values of A and B are evaluated according to the values of the pipe configuration and entry configuration fields, according to the tables in sections "Implicit Matching" and "Explicit Matching".

Match Type Pipe Match Value (V) Pipe Match Mask (M) Entry Match Value (E) Filter (A) Rule (B) Ignore 0 NULL N/A 0 0 Constant 0<V<0xffff NULL N/A 0xffff V Changeable (per entry) 0xffff NULL 0≤E≤0xffff 0xffff E

Match Type Pipe Match Value (V) Pipe Match Mask (M) Entry Match Value (E) Filter (A) Rule (B) Constant V!=0xffff 0<M≤0xffff 0≤E≤0xffff M M&V Changeable V==0xffff 0<M≤0xffff 0≤E≤0xffff M M&E Ignored 0≤V<0xffff M==0 0≤E≤0xffff 0 0

Action Type Pipe Action Value (V) Pipe Action Mask (M) Entry Action Value (E) Action on the Field Ignore 0 NULL N/A None Constant 0 < V < 0xffff NULL N/A Set to V Changeable 0xffff NULL E Set to E

Implicit action applying example:

Destination IPv4 address is 255.255.255.255

No mask provided

Entry value is 192.168.0.1

Result – The action field is changeable. Therefore, the value is provided by the entry. If a match on the packet occurs, the packet destination IPv4 address is changed to 192.168.0.1.

Note Meta only supports changeable action.





Info Assume P is packet's field value.

Action Type Pipe Action Value (V) Pipe Action Mask (M) Entry Action Value (E) Action on the Field constant V!=0xffff 0≤M≤0xffff 0≤E≤0xffff set to (~M & P) | (M & V) In words: modify only bits that are set on the mask to the values in V Changeable V==0xffff 0<M≤0xffff 0≤E≤0xffff set to (~M & P) | (M & E) Ignored 0≤V<0xffff M==0 0≤E≤0xffff none

Explicit action applying example:

Destination IPv4 address is 192.168.10.1

Mask is provided and equals 255.255.0.0

Entry value is ignored

Result – If a match on the packet occurs, the packet destination IPv4 value changes to 192.168.0.0

Match is a mandatory parameter when creating a pipe. Using the doca_flow_match struct, users must define the packet fields to be matched by the pipe.

For each doca_flow_match field, users select whether the field type is:

Ignore (match any) – the value of the field is ignored in a packet. In other words, match on any value of the field.

Constant (specific) – all entries in the pipe have the same value for this field. Users should not put a value for each entry.

Changeable – the value of the field is defined per entry. Users must provide it upon adding an entry. Note L4 type, L3 type, and tunnel type cannot be changeable. Note gtp_next_ext_hdr_type supports only psc type (0x85).

The match field type can be defined either implicitly or explicitly using the doca_flow_pipe_cfg_set_match(struct doca_flow_pipe_cfg *cfg, const doca_flow_match *match, const doca_flow_match *match_mask) function. If match_mask == NULL , then it is done implicitly. Otherwise, it is explicit.

In the tables in the following subsections, an example is used of a 16-bit field (such as layer-4 destination port) where:

Note The same concept would apply to any other field (such as an IP address occupying 32 bits).

P stands for the packet field value

V stands for the pipe match field value

M stands for the pipe mask field value

E stands for the match entry field value

Match Type Pipe Match Value (V) Pipe Match Mask (M) Entry Match Value (E) Filter (A) Rule (B) Ignore 0 NULL N/A 0 0 Constant 0<V<0xffff NULL N/A 0xffff V Changeable (per entry) 0xffff NULL 0≤E≤0xffff 0xffff E

To match implicitly, the following considerations should be taken into account.

Ignored fields: Field is zeroed Pipeline has no comparison on the field

Constant fields – These are fields that have a constant value among all entries. For example, as shown in the following, the tunnel type is VXLAN: Copy Copied! match.tun.type = DOCA_FLOW_TUN_VXLAN; These fields must only be configured once at pipe build stage, not once per new pipeline entry.

Changeable fields – These are fields whose value may change per entry. For example, the following shows match on a destination IPv4 address of variable per-entry value (outer 5-tuple): Copy Copied! match.outer.ip4.dst_ip = 0xffffffff ;

The following is an example of a match, where: Outer 5-tuple L3 type is IPv4 – constant among entries by design L4 type is UDP – constant among entries by design Tunnel type is DOCA_FLOW_TUN_VXLAN – constant among entries by design IPv4 destination address varies per entry UDP destination port is always DOCA_VXLAN_DEFAULT_PORT VXLAN tunnel ID varies per entry The rest of the packet fields are ignored Inner 5-tuple L3 type is IPv4 – constant among entries by design L4 type is TCP – constant among entries by design IPv4 source and destination addresses vary per entry TCP source and destination ports vary per entry The rest of the packet fields are ignored



Collapse Source Copy Copied! // filter creation static void build_underlay_overlay_match(struct doca_flow_match *match) { //outer match->outer.l3_type = DOCA_FLOW_L3_TYPE_IP4; match->outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP; match->tun.type = DOCA_FLOW_TUN_VXLAN; match->outer.ip4.dst_ip = 0xffffffff; match->outer.udp.l4_port.dst_port = DOCA_VXLAN_DEFAULT_PORT; match->tun.vxlan_tun_id = 0xffffffff; //inner match->inner.l3_type = DOCA_FLOW_L3_TYPE_IP4; match->inner.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_TCP; match->inner.ip4.dst_ip = 0xffffffff; match->inner.ip4.src_ip = 0xffffffff; match->inner.tcp.l4_port.src_port = 0xffff; match->inner.tcp.l4_port.dst_port = 0xffff; } // create entry specifying specific values to match upon doca_error_t add_entry(struct doca_flow_pipe *pipe, struct doca_flow_port *port, struct doca_flow_pipe_entry **entry) { struct doca_flow_match match = {}; struct entries_status status = {}; doca_error_t result; match.outer.ip4.dst_ip = BE_IPV4_ADDR(7, 7, 7, 1); match.tun.vxlan_tun_id = RTE_BE32(9876); match.inner.ip4.src_ip = BE_IPV4_ADDR(8, 8, 8, 1); match.inner.ip4.dst_ip = BE_IPV4_ADDR(9, 9, 9, 1); match.inner.tcp.l4_port.src_port = rte_cpu_to_be_16(5678); match.inner.tcp.l4_port.dst_port = rte_cpu_to_be_16(1234); result = doca_flow_pipe_add_entry(0, pipe, &match, &actions, NULL, NULL, 0, &status, entry); }

Note The fields of the doca_flow_meta struct inside the match are not subject to implicit match rules and must be paired with explicit mask values.





Match Type Pipe Match Value (V) Pipe Match Mask (M) Entry Match Value (E) Filter (A) Rule (B) Constant V!=0xffff 0<M≤0xffff 0≤E≤0xffff M M&V Changeable V==0xffff 0<M≤0xffff 0≤E≤0xffff M M&E Ignored 0≤V<0xffff M==0 0≤E≤0xffff 0 0

In this case, there are two doca_flow_match items, the following considerations should be considered:

Ignored fields M equals zero. This can be seen from the table where the rule equals 0. Since mask is also 0, the resulting packet after the filter is0. Thus, the comparison always succeeds. Copy Copied! match_mask.inner.ip4.dst_ip = 0;

Constant fields These are fields that have a constant value. For example, as shown in the following, the inner 5-tuple match on IPv4 destination addresses belonging to the 0.0.0.0/24 subnet, and this match is constant among all entries: Copy Copied! // BE_IPV4_ADDR converts 4 numbers A,B,C,D to a big endian representation of IP address A.B.C.D match.inner.ip4.dst_ip = 0; match_mask.inner.ip4.dst_ip = BE_IPV4_ADDR(255, 255, 255, 0); For example, as shown in the following, the inner 5-tuple match on IPv4 destination addresses belonging to the 1.2.0.0/16 subnet, and this match is constant among all entries. The last two octets of the match.inner.ip4.dst_ip are ignored because the match_mask of 255.255.0.0 is applied: Copy Copied! // BE_IPV4_ADDR converts 4 numbers A,B,C,D to a big endian representation of IP address A.B.C.D match.inner.ip4.dst_ip = BE_IPV4_ADDR(1, 2, 3, 4); match_mask.inner.ip4.dst_ip = BE_IPV4_ADDR(255, 255, 0, 0); Once a field is defined as constant, the field's value cannot be changed per entry. Tip Users should set constant fields to zero when adding entries for better code readability. A more complex example of constant matches may be achieved as follows: Copy Copied! match_mask.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(0xf0f0); match.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(0x5020) The following ports would be matched: 0x5020 - 0x502f 0x5120 - 0x512f ... 0x5f20 - 0x5f2f



Changeable fields

The following example matches on either FTP or TELNET well known port numbers and forwards packets to a server after modifying the destination IP address and destination port numbers. In the example, either FTP or TELNET are forwarded to the same server. FTP is forwarded to port 8000 and TELNET is forwarded to port 9000.

Copy Copied! // at Pipe creation doca_flow_pipe_cfg_set_name(pipe_cfg, "PORT_MAPPER"); doca_flow_pipe_cfg_set_type(pipe_cfg, DOCA_FLOW_PIPE_BASIC); match.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(0xffff); // v match_mask.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(0xffff); // M doca_flow_pipe_cfg_set_match(pipe_cfg, &match, &match_mask); actions_arr[0] = &actions; doca_flow_pipe_cfg_set_actions(pipe_cfg, action_arr, NULL, NULL, 1); doca_flow_pipe_cfg_set_is_root(pipe_cfg, true); // Adding entries // FTP match.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(20); // E actions.outer.ip4.src_ip = server_addr; actions.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(8000); result = doca_flow_pipe_add_entry(0, pipe, &match, &actions, NULL, NULL, 0, &status, entry); // TELNET match.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(23); // E actions.outer.ip4.src_ip = server_addr; actions.outer.tcp.l4_port.dst_port = rte_cpu_to_be_16(9000); result = doca_flow_pipe_add_entry(0, pipe, &match, &actions, NULL, NULL, 0, &status, entry);

Relaxed matching mode grants users full control on matching fields and guarantees that no fields are implicitly added by DOCA Flow.

Relaxed matching mode provides full control to the DOCA application developer over the match design, without adding implicit match logic by the DOCA Flow library. This approach increases user responsibility to prevent unintended side effects caused by packet layout similarities (e.g., between UDP and TCP source/destination ports) or by skipping header type validation before matching on header fields. For instance, matching solely on a UDP destination port does not verify the presence of an L4 header or confirm that the L4 header is of UDP type.

To ensure effective design, early-stage pipes should classify packet types to filter out undesired packets. As the pipeline progresses, later stages can focus on more specific packet types based on prior matches. Examples include:

Match on VXLAN VNI: For Early-stage pipes, ensure the packet contains a VXLAN header (this can be achieved in a single pipe): The first pipe verifies the packet has a UDP header by matching the L4 packet type to UDP or the L3 next_proto field to UDP The second pipe matches the UDP destination port to the commonly used VXLAN value (4789) For later-stage pipes, match on the VXLAN VNI field

Match on UDP destination port: For early-stage pipes, verify the packet contains a UDP header by matching the L4 packet type to UDP or the L3 next_proto field to UDP For later-stage pipes, match on the UDP destination port field



Consider the following strict matching mode example. There are three pipes:

Basic pipe A with match.outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_TCP and match.outer.tcp.flags = 1

Basic pipe B with match.outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP and match.outer.udp.l4_port.src_port = 8080

Control pipe C with two entries to direct TCP traffic to pipe A and UDP traffic to pipe B . The first entry has match.outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_TCP while the second has match.outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP .

As a result, the hardware matches on the L4 header type twice:

When the packet enters the filter in control pipe C to decide the next pipe

When the packet enters the filter of pipe A or B to match on the L4 header fields

With particularly large pipelines, such double matches decrease performance and increase the memory footprint in hardware. Relaxed matching mode gives the user greater control of the match logic to eliminate the implicitly added matches, consequently reducing hardware memory footprint and improving performance as well.

Parser meta matching is particularly useful when it comes to matching on a specific packet type. In relaxed mode, type selectors in the outer , inner , and tun parts of the doca_flow_match structs are used only for the type cast of the underlying unions. Header-type (packet type) matches are available using the parser_meta API.

For example, the scenario from the previous section may be overwritten by changing the match of control pipe C (with the same A and B pipes):

Basic pipe A with match.outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_TCP and match.outer.tcp.flags = 1

Basic pipe B with match.outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP and match.outer.udp.l4_port.src_port = 8080

Control pipe C with two entries to direct TCP traffic to pipe A and UDP traffic to pipe B . The first entry has match.parser_meta.outer_l4_type = DOCA_FLOW_L4_META_TCP while the second has match.parser_meta.outer_l4_type = DOCA_FLOW_L4_META_UDP .

As a result, the hardware performs the L4 header-type match only once, when the packet enters the filter of control pipe. Basic pipes' match.outer.l4_type_ext are used only for the selection of the match.outer.tcp or match.outer.udp structures during the inspection of match struct.

The following code snippets are used to demonstrate the redesign of a pipeline with relaxed matching for non-tunnel match cases.

The following is the code before the redesign:

Copy Copied! static void pipe_match_build( struct doca_flow_match *match) { match->outer.l3_type = DOCA_FLOW_L3_TYPE_IP4; match->outer.ip4.dst_ip = 0xffffffff; match->outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP; match->outer.udp.l4_port.src_port = 22; }

Without relaxed matching, the following hardware matches were performed for the code snippet above:

L3 header type is IPv4 – constant among entries by design

L4 header type is UDP – constant among entries by design

IPv4 destination address varies per entry

UDP source port is constant among entries

The rest of the packet fields are ignored

With relaxed matching the following pipeline stages, where pipe1 forwards packets to pipe2 , should be considered to achieve a similar match as above:

Copy Copied! static void pipe1_match_build( struct doca_flow_match *match) { match.parser_meta.outer_l3_type = DOCA_FLOW_L3_META_IPV4; match.parser_meta.outer_l4_type = DOCA_FLOW_L4_META_UDP; } static void pipe2_match_build( struct doca_flow_match *match) { match->outer.l3_type = DOCA_FLOW_L3_TYPE_IP4; match->outer.ip4.dst_ip = 0xffffffff; match->outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP; match->outer.udp.l4_port.src_port = 22; }

The following code snippet demonstrates the redesign of a pipeline with relaxed matching for tunnel match cases:

Copy Copied! static void pipe1_match_build( struct doca_flow_match *match) { match->outer.l3_type = DOCA_FLOW_L3_TYPE_IP4; match->outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP; match->tun.type = DOCA_FLOW_TUN_VXLAN; match->outer.ip4.dst_ip = 0xffffffff; match->outer.udp.l4_port.src_port = 0x22; match->tun.vxlan_tun_id = 0xffffffff; }

Without relaxed matching, the following hardware matches were performed for the code snippet above:

L3 type is IPv4 – constant among entries by design

L4 type is UDP – constant among entries by design

Tunnel type is DOCA_FLOW_TUN_VXLAN – constant among entries by design

IPv4 destination address varies per entry

UDP source port is always 22

VXLAN tunnel ID varies per entry

The rest of the packet fields are ignored

With relaxed matching, the following pipeline stages, where pipe1 forwards packets to pipe2 , should be considered to achieve a similar match as above:

Copy Copied! static void pipe1_match_build( struct doca_flow_match *match) { match.parser_meta.outer_l3_type = DOCA_FLOW_L3_META_IPV4; match.parser_meta.outer_l4_type = DOCA_FLOW_L4_META_UDP; } static void pipe2_match_build( struct doca_flow_match *match) { match->outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP; match->outer.udp.l4_port.src_port = 22; match->outer.udp.l4_port.dst_port = DOCA_VXLAN_DEFAULT_PORT; match->outer.l3_type = DOCA_FLOW_L3_TYPE_IP4; match->outer.ip4.dst_ip = 0xffffffff; match->tun.type = DOCA_FLOW_TUN_VXLAN; match->tun.vxlan_tun_id = 0xffffffff; }





With relaxed matching, header type fields within outer , inner , or tun structs no longer indicate a match on the specific protocol. They are used solely as a selector for the relevant header fields. For example, to match on outer.ip4.dst_ip , users must set outer.l3_type = DOCA_FLOW_L3_TYPE_IP4 . With this match, the L3 header is checked for the IPv4 destination address, however there is no check that the L3 header is of IPv4 type. It is user responsibility to make sure that packets arriving to such a filter indeed have an L3 header of type IPv4.

To match on a specific protocol/tunnel type, consider the following:

To match on an inner/outer L3/L4 protocol type, users can use relevant doca_flow_parser_meta fields as explained above. For example, for outer protocols, parser_meta.outer_l[3,4]_type fields can be used.

To match on a specific tunnel type, users should match on a tunnel according to its specification. For example, for a VXLAN tunnel, a match on UDP destination port 4789 can be used. Another option is to use the L3 next protocol field. For example, for IPv4 with next header GRE, one can match on the IPv4 headers' next protocol field value to match GRE IP protocol number 47.

More relaxed matching design best practices can be found in the samples Flow Drop, Flow VXLAN Encap, and LPM with Exact Match Logic.

Note With relaxed matching, to achieve a match-all functionality, either one of the following methods can be used during pipe creation: Set the match_mask structure to NULL and set the match structure to all zeroes

Set the match_mask structure to all zeroes while the match structure have any setting

Note With relaxed matching, if any of the selectors is used without setting a relevant field, the pipe/entry creation would fail with the following error message: Copy Copied! failed building active opcode - active opcode <opcode number> is protocol only

When setting actions, they are executed in the following order:

Crypto (decryption) Decapsulation Pop Meta Outer Tun Push Encapsulation Crypto (encryption)

The desc_array field in struct doca_flow_actions_descs is executed by order of the array after the above action execution.

Note Modifying a field while simultaneously using it as a source for other modifications should be avoided, as the sequence of modification actions cannot be guaranteed.





Similarly to setting pipe match, actions also have a template definition.

Similarly to doca_flow_match in the creation phase, only the subset of actions that should be executed per packet are defined. This is done in a similar way to match, namely by classifying a field of doca_flow_match to one of the following:

Ignored field – field is zeroed, modify is not used.

Constant fields – when a field must be modified per packet, but the value is the same for all packets, a one-time value on action definitions can be used

Changeable fields – fields that may have more than one possible value, and the exact values are set by the user per entry Copy Copied! actions.outer.ip4.dst_ip = 0xffffffff Note The action_mask should be set as 0xffffffff and action as 0 if the user wants to configure 0 to this field.

It is possible to force constant modification or per-entry modification with action mask. For example:

Copy Copied! static void create_constant_modify_actions(struct doca_flow_actions *actions， struct doca_flow_actions *actions_mask, struct doca_flow_action_descs *descs) { actions->outer.l4_type_ext = DOCA_FLOW_L4_TYPE_EXT_UDP; actions->outer.udp.src_port = 0x1234 ; actions_mask->outer.udp.src_port = 0xffff ; }





The action descriptor can be used to copy between the packet field and metadata. For example:

Copy Copied! #define META_U32_BIT_OFFSET(idx) (offsetof(struct doca_flow_meta, u32[(idx)]) << 3 ) static void create_copy_packet_to_meta_actions(struct doca_flow_match *match， struct doca_flow_action_desc *desc) { desc->type = DOCA_FLOW_ACTION_COPY; desc->field_op.src.field_string = "outer.ipv4.src_ip" ; desc->field_op.src.bit_offset = 0 ; desc->field_op.dst.field_string = "meta.data" ; desc->field_op.dst.bit_offset = META_U32_BIT_OFFSET( 1 ); ; }





Creating a pipe is possible using a list of multiple actions. For example:

Copy Copied! static void create_multi_actions_for_pipe_cfg() { struct doca_flow_actions *actions_arr[ 2 ]; struct doca_flow_actions actions_0 = { 0 }, actions_1 = { 0 }; struct doca_flow_pipe_cfg *pipe_cfg; actions_arr[ 0 ] = &actions_0; actions_arr[ 1 ] = &actions_1; doca_flow_pipe_cfg_set_actions(pipe_cfg, actions_arr, NULL, NULL, 2 ); }





Pipe Creation Entry Creation Behavior action_desc Pipe Actions Pipe Actions Mask Entry Actions doca_flow_action_type Configuration DOCA_FLOW_ACTION_AUTO/ action_desc = NULL No specific config 0 0 N/A Field ignored, no modification 0 mask != 0 N/A Apply 0 and mask to all entries val != 0 && val != 0xFF mask != 0 N/A Apply val and mask to all entries val = 0xFF mask = 0 N/A Apply 0xFF to all entries val = 0xFF mask != 0 Define val per entry Apply entry's val and mask DOCA_FLOW_ACTION_ADD Add field value or from src Define only the dst field and width val != 0 N/A N/A Add to dst this val to all entries val == 0 N/A Define val per entry Add to dst entry's val Define the src and dst fields and width Define the source and destination fields. Meta field → header field

Header field → meta field

Meta field → meta field N/A N/A Add data from src fields to dst for all entries DOCA_FLOW_ACTION_COPY Copy field to another field N/A Define the source and destination fields. Meta field → header field

Header field → meta field

Meta field → meta field N/A N/A Copy data between fields for all entries

If a meter policer should be used, then it is possible to have the same configuration for all policers on the pipe or to have a specific configuration per entry. The meter policer is determined by the FWD action. If an entry has NULL FWD action, the policer FWD action is taken from the pipe.

If a mirror should be used, mirror can be shared on the pipe or configured to have a specific value per entry.

The monitor also includes the aging configuration, if the aging time is set, this entry ages out if timeout passes without any matching on the entry.

For example:

Copy Copied! static void build_entry_monitor(struct doca_flow_monitor *monitor, void *user_ctx) { monitor->aging_sec = 10; }

Refer to Pipe Entry Aged Query for more information.

The FWD (forwarding) action is the last action in a pipe, and it directs where the packet goes next. Users may configure one of the following destinations:

Send to software (representor)

Send to wire

Jump to next pipe

Drop packets

The FORWARDING action may be set for pipe create, but it can also be unique per entry.

Pipe forwarding can be s et either at creation time or be deferred to entry addition:

If the fwd.type type is not DOCA_FLOW_FWD_CHANGEABLE , any match on the pipe, on any entry, is forwarded to the specified target

If the fwd.type type is DOCA_FLOW_FWD_CHANGEABLE , the target would match on whatever is defined in the fwd.type field of the specific entry

Putting this logic in a table look like this:

Pipe Fwd.type Entry Fwd.type Actual Forward Comment Equals DOCA_FLOW_FWD_CHANGEABLE X X X must not equal DOCA_FLOW_FWD_CHANGEABLE X != DOCA_FLOW_FWD_CHANGEABLE Does not care X

When a pipe includes meter monitor <cir, cbs> , it must h ave fwd defined as well as the policer.

If a pipe is created with a dedicate constant mirror with FWD, the pipe FWD can be from a mirror FWD or a pipe FWD and the two FWDs are exclusive. It is not allowed to specify a mirror with a FWD to a pipe with FWD also.

If a mirror FWD is not configured, the FWD is from the pipe configuration. The FWD of the pipe with a mirror cannot be direct RSS, only shared RSS from NULL FWD is allowed.

The following is an RSS forwarding example:

Copy Copied! fwd.type = DOCA_FLOW_FWD_RSS; fwd.rss_type = DOCA_FLOW_RESOURCE_TYPE_NON_SHARED; fwd.rss.queues_array = queues; fwd.rss.outer_flags = DOCA_FLOW_RSS_IPV4 | DOCA_FLOW_RSS_UDP; fwd.rss.nr_queues = 4;

Queues point to the uint16_t array that contains the queue numbers. When a port is started, the number of queues is defined, starting from zero up to the number of queues minus 1. RSS queue numbers may contain any subset of those predefined queue numbers. For a specific match, a packet may be directed to a single queue by having RSS forwarding with a single queue.

Changeable RSS forwarding is supported. When creating the pipe, the num_of_queues must be set to 0xffffffff , then different forwarding RSS information can be set when adding each entry.

Copy Copied! fwd->num_of_queues = 0xffffffff;

The packet is directed to the port. In many instances the complete pipe is executed in the hardware, including the forwarding of the packet back to the wire. The packet never arrives to the software.

Example code for forwarding to port:

Copy Copied! struct doca_flow_fwd *fwd = malloc(sizeof(struct doca_flow_fwd)); memset(fwd, 0, sizeof(struct doca_flow_fwd)); fwd->type = DOCA_FLOW_FWD_PORT; fwd->port_id = port_id; // this should the same port_id that was set in doca_flow_port_cfg_set_port_id()

The type of forwarding is DOCA_FLOW_FWD_PORT and the only data required is the port_id as defined in DOCA_FLOW_PORT .

Changeable port forwarding is also supported. When creating the pipe, the port_id must be set to 0xffff , then different forwarding port_id values can be set when adding each entry.

Copy Copied! fwd->port_id = 0xffff;





DOCA Flow supports several types of resources that can be shared. The supported types of resources can be:

Meters

Counters

RSS queues

Mirrors

PSPs

Encap

Decap

IPsec SA

Shared resources can be used by several pipes and can save device and memory resources while promoting better performance.

To create and configure shared resource, the user should go through the steps detailed in the following subsections.

Call doca_flow_cfg_create(&flow_cfg) , passing a pointer to struct doca_flow_cfg to be used to fill the required parameters for the shared resource.

Note The struct doca_flow_cfg object is used for configuring other resources besides the aforementioned shared resources, but this section only refers to the configuration of shared resources.





This can be done by calling doca_flow_cfg_set_nr_shared_resource() . Refer to the API documentation for details on the configuration process.

Conclude the configuration by calling doca_flow_init() .

When shared resources are allocated, they are assigned identifiers ranging from 0 and increasing incrementally. For example, if the user configures two shared counters, they would bear the identifiers 0 and 1.

Note Note that each resource has its own identifier space. So, if users have two shared counters and three meters, they would bear identifiers 0..1 and 0..2 respectively.

Configuring the shared resources requires the user to call doca_flow_shared_resource_set_cfg() .

A shared resource must be bound by calling doca_flow_shared_resources_bind() which binds the resource to a pointer. The object to which the resource is bound is usually a struct doca_flow_port pointer.

After a resource has been configured, it can be used by referring to its ID.

In the case of meters, counters, and mirrors, they are referenced through struct doca_flow_monitor during pipe creation or entry addition.

Querying shared resources can be done by calling doca_flow_shared_resources_query() . The function accepts the resource type and an array of resource numbers, and returns an array of struct doca_flow_shared_resource_result with the results.

A shared meter can be used in multiple pipe entries (hardware steering mode support only).

The shared meter action marks a packet with one of three colors: Green, Yellow, and Red. The packet color can then be matched in the next pipe, and an appropriate action may be taken. For example, packets marked in red color are usually dropped. So, the next pipe to meter action may have an entry which matches on red and has fwd type DOCA_FLOW_FWD_DROP .

DOCA Flow supports three marking algorithms based on RFCs: 2697, 2698, and 4115.

RFC 2697 – Single-rate Three Color Marker (srTCM)

CBS (committed burst size) is the bucket size which is granted credentials at a CIR (committed information rate). If CBS overflow occurs, credentials are passed to the EBS (excess burst size) bucket. Packets passing through the meter consume credentials. A packet is marked green if it does not exceed the CBS, yellow if it exceeds the CBS but not the EBS, and red otherwise. A packet can have an initial color upon entering the meter. A pre-colored yellow packet will start consuming credentials from the EBS.

RFC 2698 – Two-rate Three Color Marker (trTCM)

CBS and CIR are defined as in RFC 2697. PBS (peak burst size) is a second bucket which is granted credentials at a PIR (peak information rate). There is no overflow of credentials from the CBS bucket to the PBS bucket. The PIR must be equal to or greater than the CIR. Packets consuming CBS credentials consume PBS credentials as well. A packet is marked red if it exceeds the PIR. Otherwise, it is marked either yellow or green depending on whether it exceeds the CIR or not. A packet can have an initial color upon entering the meter. A pre-colored yellow packet starts consuming credentials from the PBS.

RFC 4115 – trTCM without Peak-rate Dependency

EBS is a second bucket which is granted credentials at a EIR (excess information rate) and gets overflowed credentials from the CBS. For the packet marking algorithm, refer to RFC 4115.

The following sections present the steps for configuring and using shared meters to mark packets.

The IPsec Security Association (SA) shared resource is used for IPsec ESP encryption protocol. The resource should be pointed from the doca_flow_crypto_actions struct that inside doca_flow_actions .

By default, the resource manages the state of the sequence number (SN), incrementing each packet on the encryption side, and performing anti-replay protection on the decryption side. The anti-replay syndrome is stored in meta.u32[0] .

To control the SN in software, sn_offload should be disabled per port in the configuration for doca_flow_port_start (see DOCA API documentation for details). Once sn_offload is disabled, the following fields are ignored: sn_offload_type , win_size , sn_initial , and lifetime_threshold .

When shared resource query is called for an IPsec SA resource, the current SN is retrieved for the encryption resource and the lower bound of anti-replay window is retrieved for the decryption resource. Querying IPsec SA can only be called when sn_offload is enabled.

To maintain a valid state of the resource during its usage, doca_flow_crypto_ipsec_resource_handle should be called periodically.

The mirror shared resource is used to clone packets to other pipes, vports (switch mode only), RSS queues (VNF mode only), or drop.

Info The maximum supported mirror number is 4K.

Info The maximum supported mirror clone destination is 254.

Mirror clone destination as next_pipe cannot be intermixed with port or rss types. Only clone destination and origin destination both as next_pipe is supported.

The register copy for packet after mirroring is not saved.

Note For switch mode, there are several mirror limitations which should be noted: Mirror should be cloned to DOCA_FLOW_DIRECTION_BIDIRECTIONAL pipe

The register copy for pkt after mirroring is not saved

Mirror should not be cloned to RSS pipe directly

Encap is supported while cloning a packet to a wire port only

Mirror must not be configured on a resizable pipe

Mirror IDs which can be used are 0 to nr_shared_mirrors -1. ID 0 cannot be used when creating a pipe.

If mirror creation fails, users should check the resulting syndrome for failure details.

To maintain the order of the mirrored packets in relation to the non-mirrored ones, set a first mirror target forward destination equivalent to the non-mirrored packets as illustrated in the following diagram:

In NVIDIA® BlueField®-3, NVIDIA® ConnectX®-7, and lower, when using the mirror action in the egress domain, mirrored packets cannot preserve the order with the non-mirrored packets due to the high latency of the mirror operation. To maintain the order, use DOCA_FLOW_FWD_DROP as the target forward as illustrated in the following diagram:

The encap shared resource is used for encapsulation. A shared encap ID represents one kind of encap configuration and can be used in multiple pipes and entries (hardware steering mode support only).

The shared encap action encapsulates the packet with the configured tunnel information.

The decap shared resource is used for decapsulation. A shared decap ID represents one kind of decap configuration and can be used in multiple pipes and entries (hardware steering mode support only).

The shared decap action decapsulates the packet. Ethernet information should be provided when is_l2 is false.

The PSP shared resource is used for PSP encryption. The resource should be pointed to from the doca_flow_crypto_actions struct in doca_flow_actions .

The resource should be configured with a key to encrypt the packets. See NVIDIA DOCA Library API documentation for PSP key generation for a reference about key handling on decrypt side.

Once all parameters are defined, the user should call doca_flow_pipe_create to create a pipe.

The return value of the function is a handle to the pipe. This handle should be given when adding entries to pipe. If a failure occurs, the function returns NULL , and the error reason and message are put in the error argument if provided by the user.

Refer to the DOCA Library APIs to see which fields are optional and may be skipped. It is typically recommended to set optional fields to 0 when not in use. See Miss Pipe and Control Pipe for more information.

Once a pipe is created, a new entry can be added to it. These entries are bound to a pipe, so when a pipe is destroyed, all the entries in the pipe are removed. Please refer to section Pipe Entry for more information.

There is no priority between pipes or entries. The way that priority can be implemented is to match the highest priority first, and if a miss occurs, to jump to the next PIPE. There can be more than one PIPE on a root as long the pipes are not overlapping. If entries overlap, the priority is set according to the order of entries added. So, if two pipes have overlapping matching and PIPE1 has higher priority than PIPE2, users should add an entry to PIPE1 after all entries are added to PIPE2.

Number of entries defined during pipe create includes also the entries in caches. This means that available pipe size is less than defined, depend on number of entries currently in caches.

Total number of entries in caches is up to 1K * number of queues (limited by 2% from the pipe size).

An entry is a specific instance inside of a pipe. When defining a pipe, users define match criteria (subset of fields to be matched), the type of actions to be done on matched packets, monitor, and, optionally, the FWD action.

When a user calls doca_flow_pipe_add_entry() to add an entry, they should define the values that are not constant among all entries in the pipe. And if FWD is not defined then that is also mandatory.

DOCA Flow is designed to support concurrency in an efficient way. Since the expected rate is going to be in millions of new entries per second, it is mandatory to use a similar architecture as the data path. Having a unique queue ID per core saves the DOCA engine from having to lock the data structure and enables the usage of multiple queues when interacting with hardware.

Each core is expected to use its own dedicated pipe_queue number when calling doca_flow_pipe_entry . Using the same pipe_queue from different cores causes a race condition and has unexpected results.

Note Applications are expected to avoid adding, removing, or updating pipe entries from within a doca_flow_entry_process_cb .

Entry insertion can fail in two places, add_entry and add_entry_cb .

When add_entry fails, no cleanup is required.

When add_entry succeeds, a handle is returned to the user. If the subsequent add_entry_cb fails, the user is responsible for releasing the handle through a rm_entry call. This rm_entry call is expected to return DOCA_SUCCESS and is expected to invoke doca_rm_entry_cb with a successful return code.

By default, no counter is added. If defined in monitor, a unique counter is added per entry.

Note Having a counter per entry affects performance and should be avoided if it is not required by the application.

The retrieved statistics are stored in struct doca_flow_query .

Note Counters have a granularity of 1 second.





When a user calls doca_flow_aging_handle() , this query is used to get the aged-out entries by the time quota in microseconds. The user callback is invoked by this API with the aged entries.

Since the number of flows can be very large, the query of aged flows is limited by a quota in microseconds. This means that it may return without all flows and requires the user to call it again. When the query has gone over all flows, a full cycle is done.

Updating an entry requires available space in the pipe. If the pipe is full, the function returns DOCA_ERROR_AGAIN . To proceed with the update, you may need to release an existing entry to free up space.

Users can define multiple actions per pipe. This gives the user the option to specify a different action per entry in the pipe by providing the action_idx in struct doca_flow_actions . Note that even with multiple actions defined for a pipe, any packet processed will still result in at most one action being executed.

For example, to create multiple flows with identical match fields but different actions, users can define two actions during pipe creation, Action_0 and Action_1 . These actions are respectively assigned indices 0 and 1 in the pipe configuration's actions array. Action_0 includes modify_mac , while Action_1 includes modify_ip . Users can then add two types of entries to the pipe: the first entry uses Action_0 by setting the action_idx field in struct doca_flow_actions to 0, and the second entry uses Action_1 by setting action_idx to 1.

Note Only one root pipe is allowed. If more than one is needed, create a control pipe as root and forward the packets to relevant non-root pipes.

To set priority between pipes, users must use miss-pipes. Miss pipes allow to look up entries associated with pipe X, and if there are no matches, to jump to pipe X+1 and perform a lookup on entries associated with pipe X+1.

The following figure illustrates the hardware table structure:

The first lookup is performed on the table with priority 0. If no hits are found, then it jumps to the next table and performs another lookup.

The way to implement a miss pipe in DOCA Flow is to use a miss pipe in FWD. In struct doca_flow_fwd , the field next_pipe signifies that when creating a pipe, if a fwd_miss is configured then if a packet does not match the specific pipe, steering should jump to next_pipe in fwd_miss .

Note fwd_miss is of type struct doca_flow_fwd but it only implements two forward types of this struct: DOCA_FLOW_FWD_PIPE – forwards the packet to another pipe

DOCA_FLOW_FWD_DROP – drops the packet Other forwarding types (e.g., forwarding to port or sending to RSS queue) are not supported.

next_pipe is defined as doca_flow_pipe and created by doca_flow_pipe_create . To separate miss_pipe and a general one, is_root is introduced in struct doca_flow_pipe_cfg . If is_root is true, it means the pipe is a root pipe executed on packet arrival. Otherwise, the pipe is next_pipe .

When fwd_miss is not null, the packet that does not match the criteria is handled by next_pipe which is defined in fwd_miss .

In internal implementations of doca_flow_pipe_create , if fwd_miss is not null and the forwarding action type of miss_pipe is DOCA_FLOW_FWD_PIPE , a flow with the lowest priority is created that always jumps to the group for the next_pipe of the fwd_miss . Then the flow of next_pipe can handle the packets, or drop the packets if the forwarding action type of miss_pipe is DOCA_FLOW_FWD_DROP .

For example, VXLAN packets are forwarded as RSS and hairpin for other packets. The miss_pipe is for the other packets (non-VXLAN packets) and the match is for general Ethernet packets. The fwd_miss is defined by miss_pipe and the type is DOCA_FLOW_FWD_PIPE . For the VXLAN pipe, it is created by doca_flow_create() and fwd_miss is introduced.

Since, in the example, the jump flow is for general Ethernet packets, it is possible that some VXLAN packets match it and cause conflicts. For example, VXLAN flow entry for ipA is created. A VXLAN packet with ipB comes in, no flow entry is added for ipB , so it hits miss_pipe and is hairpinned.

A control pipe is introduced to handle the conflict. After creating a control pipe, the user can add control entries with different matches, forwarding, and priorities when there are conflicts.

The user can add a control entry by calling doca_flow_control_pipe_add_entry() .

priority must be defined as higher than the lowest priority (3) and lower than the highest one (0).

The other parameters represent the same meaning of the parameters in doca_flow_pipe_create . In the example above, a control entry for VXLAN is created. The VLXAN packets with ipB hit the control entry.

doca_flow_pipe_lpm uses longest prefix match (LPM) matching. LPM matching is limited to a single field of the match provided by the user at pipe creation (e.g., the outer destination IP). Each entry is consisted of a value and a mask (e.g., 10.0.0.0/8, 10.10.0.0/16, etc). The LPM match is defined as the entry that has the maximum matching bits. For example, using the two entries 10.7.0.0/16 and 10.0.0.0/8, the IP 10.1.9.2 matches on 10.0.0.0/8 and IP 10.7.9.2 matches on 10.7.0.0/16 because 16 bits are the longest prefix matched.

In addition to the longest prefix match logic, LPM supports exact match (EM) logic on the meta.u32 , inner destination MAC and VNI. Only index 1 is supported for meta.u32 . Any combination of these three fields can be chosen for EM. However, if inner destination MAC is chosen for LPM, then it should not be chosen for EM as well. If more than one field is chosen for EM, a logical AND is applied. Support for EM on meta allows working with any single field by copying its value to the meta.u32[1] on pipes before LPM. EM is performed at the same time as LPM matching (i.e., a logical AND is applied for both logics). For example, if there is a match on LPM logic, but the value in the fields chosen for EM is not exactly matched, this constitutes an LPM pipe miss.

To enable EM logic in an LPM pipe, two steps are required:

Provide match_mask to the LPM pipe creation with meta.u32[1] being fully masked and/or inner.eth.dst_mac and/or tun.vxlan_tun_id , while setting match_mask.tun.type to DOCA_FLOW_TUN_VXLAN . Thus, the match parameter is responsible for the choice of field for LPM logic, while the match_mask parameter is responsible for the enablement of EM logic. Separation into two parameters is done to distinguish which field is for LPM logic and which is for EM logic, when both fields can be used for LPM (e.g., destination IP address and source MAC address). Per entry, provide values to do exact match using the match structure. match_mask is used only for LPM-related masks and is not involved into EM logic.

EM logic allows inserting many entries with different meta values for the same pair of LPM-related data. Regarding IPv4-based LPM logic with exact match enabled: LPM pipe can have 1.1.1.1/32 with meta 42, 555, and 1020. If a packet with 1.1.1.1/32 goes through such an LPM pipe, its meta value is compared against 42, 555, and 1020.

The actions and FWD of the DOCA Flow LPM pipe work the same as the basic DOCA Flow pipe.

Note The monitor only supports non-shared counters in the LPM pipe.

doca_flow_pipe_lpm insertion max latency can be measured in milliseconds in some cases and, therefore, it is better to insert it from the control path. To get the best insertion performance, entries should be added in large batches.

Note An LPM pipe cannot be a root pipe. You must create a pipe as root and forward the packets to the LPM pipe.

Note An LPM pipe can only do LPM matching on inner and outer IP and MAC addresses.

Note For monitoring, an LPM pipe only supports non-shared counters and does not support other capabilities of doca_flow_monitor .





doca_flow_pipe_acl uses a ccess-control list (ACL) matching. ACL matching is five tuple of the doca_flow_match . Each entry consists of a value and a mask (e.g., 10.0.0.0/8, 10.10.0.0/16, etc.) for IP address fields, port range, or specific port in the port fields, protocol, and priority of the entry.

ACL entry port configuration:

Mask port is 0 ==> Any port

Mask port is equal to match port ==> Exact port. Port with mask 0xffff.

Mask port > match port ==> Match port is used as port from and mask port is used as port to

Monitor actions are not supported in ACL. FWD of the DOCA Flow ACL pipe works the same as the basic DOCA Flow pipe.

ACL supports the following types of FWD:

DOCA_FLOW_FWD_PORT

DOCA_FLOW_FWD_PIPE

DOCA_FLOW_FWD_DROP

doca_flow_pipe_lpm insertion max latency can be measured in milliseconds in some cases and, therefore, it is better to insert it from the control path. To get the best insertion performance, entries should be added in large batches.

Note An ACL pipe can be a root pipe.

Note An ACL pipe can be in ingress and egress domain.

Note An ACL pipe must be accessed on a single queue. Different ACL pipes may be accessed on different queues.

Note Adding an entry to the ACL pipe after sending an entry with flag DOCA_FLOW_NO_WAIT is not supported.

Note Removing an entry from an ACL pipe is not supported.





doca_flow_pipe_ordered_list allows the user to define a specific order of actions and multiply the same type of actions (i.e., specific ordering between counter/meter and encap/decap).

An ordered list pipe is defined by an array of actions (i.e., sequences of actions). Each entry can be an instance one of these sequences. An ordered list pipe may consist of up to an array of 8 different actions. The maximum size of each action array is 4 elements. Resource allocation may be optimized when combining multiple action arrays in one ordered list pipe.

The doca_flow_pipe_hash API allows users to insert entries by index, where the index represents the result of a packet hash calculation.

A hash pipe accepts a doca_flow_match only during pipe creation

It uses a mask to define which fields are included in the hash calculation

Other functions such as monitor , actions , action_descs , and FWD behave identically to a basic DOCA Flow pipe

Note Ensure the nb_flows in doca_flow_pipe_attr is set to a power of 2.





In a hash pipe configuration, users can set the map algorithm using the doca_flow_pipe_cfg_set_hash_map_algorithm() API.

Info If no algorithm is explicitly set, the default algorithm is DOCA_FLOW_PIPE_HASH_MAP_ALGORITHM_HASH .

The available hash map algorithms include:

DOCA_FLOW_PIPE_HASH_MAP_ALGORITHM_HASH – The default algorithm, which uses a standard hash function for packet indexing

DOCA_FLOW_PIPE_HASH_MAP_ALGORITHM_RANDOM – Packets are directed to entries randomly. When this algorithm is selected, a match mask is not required.

DOCA_FLOW_PIPE_HASH_MAP_ALGORITHM_IDENTITY – Uses an identity function for packet mapping, ensuring a direct one-to-one relationship between the hash value and the index

DOCA_FLOW_PIPE_HASH_MAP_ALGORITHM_FLOODING – Each packet is duplicated to match the number of entries in the hash pipe, resulting in the packet hitting all entries

DOCA_FLOW_PIPE_HASH_MAP_ALGORITHM_SELECT_ENABLED – Packets are consistently directed to the same index within the pipe, providing predictable behavior

Users can configure multiple hash algorithms within a single hash pipe. When using multiple algorithms, the forwarding pipe must specify which algorithm to apply for each packet using the DOCA_FLOW_FWD_HASH_PIPE with the desired algorithm. This approach ensures that packets are processed using the correct logic based on application needs.

Users can enable hardware steering mode by setting devarg dv_flow_en to 2 .

The following is an example of running DOCA with hardware steering mode:

Copy Copied! .... –a 03:00.0, dv_flow_en=2 –a 03:00.1, dv_flow_en=2....

The following is an example of running DOCA with software steering mode:

Copy Copied! .... –a 03:00.0 –a 03:00.1 ....

The dv_flow_en=2 means that hardware steering mode is enabled.

In the struct doca_flow_cfg , setting mode_args using ( doca_flow_cfg_set_mode_args() ) represents DOCA applications. If it is set with hws (e.g., "vnf,hws" , "switch,hws" , "remmote_vnf,hws" ) then hardware steering mode is enabled.

In switch mode, fdb_def_rule_en=0,vport_match=1,repr_matching_en=0,dv_xmeta_en=4 should be added to DPDK PMD devargs, which makes DOCA Flow switch module take over all the traffic.

To create an entry by calling doca_flow_pipe_add_entry , the parameter flags can be set as DOCA_FLOW_WAIT_FOR_BATCH or DOCA_FLOW_NO_WAIT :

DOCA_FLOW_WAIT_FOR_BATCH means that this flow entry waits to be pushed to hardware. Batch flows then can be pushed only at once. This reduces the push times and enhances the insertion rate.

DOCA_FLOW_NO_WAIT means that the flow entry is pushed to hardware immediately.

The parameter usr_ctx is handled in the callback set in struct doca_flow_cfg .

doca_flow_entries_process processes all the flows in this queue. After the flow is handled and the status is returned, the callback is executed with the status and usr_ctx .

If the user does not set the callback in doca_flow_cfg , the user can get the status using doca_flow_entry_get_status to check if the flow has completed offloading or not.

In non-isolated mode (default) any received packets (following an RSS forward, for example) can be processed by the DOCA application, bypassing the kernel. In the same way, the DOCA application can send packets to the NIC without kernel knowledge. This is why, by default, no replies are received when pinging a host with a running DOCA application. If only specific packet types (e.g., DNS packets) should be processed by the DOCA application, while other packets (e.g., ICMP ping) should be handled directly the kernel, then isolated mode becomes relevant.

In isolated mode, packets that match root pipe entries are steered to the DOCA application (as usual) while other packets are received/sent directly by the kernel.

If you plan to create a pipe with matches followed by action/monitor/forward operations, due to functional/performance considerations, it is advised that root pipes entries include the matches followed by a next pipe forward operation. In the next pipe, all the planned matches actions/monitor/forward operations could be specified. Unmatched packets are received and sent by the kernel.

Info In switch mode, DPDK must be in isolated mode. DOCA Flow may be in isolated or non-isolated .

To activate isolated mode, two configurations are required:

DOCA configuration: Update the string member mode_args ( struct doca_flow_cfg ) using doca_flow_cfg_set_mode_args() which represents the DOCA application mode and add "isolated" (separated by comma) to the other mode arguments. For example: doca_flow_cfg_set_mode_args(cfg, "vnf,hws,isolated") doca_flow_cfg_set_mode_args(cfg, "switch,isolated") DPDK configuration: Set isolated_mode to 1 ( struct application_port_config ). For example, if DPDK is initialized by the API: dpdk_queues_and_ports_init(struct application_dpdk_config *app_dpdk_config) .

Copy Copied! struct application_dpdk_config app_dpdk_config = { .port_config = { .isolated_mode = 1, .nb_ports = ... ... }, ... };





The move to HWS improves performance because rule insertion is implemented in hardware rather than software. However, this move imposes additional limitations, such as the need to commit in advance on the size of the pipes (the number of rule entries). For applications that require pipe sizes to grow over time, a static size can be challenging: Committing to a pipe size too small can cause the the application to fail once the number of rule entries exceeds the committed number, and pre-committing to an excessively high number of rules can result in memory over-allocation.

This is where pipe resizing comes in handy. This feature allows the pipe size to increase during runtime with support for all entries in a new resized pipe.

Info Pipe resizing is supported in a basic pipe and a control pipe.

It is possible to set a congestion level by percentage ( CONGESTION_PERCENTAGE ). Once the number of entries in the pipe exceeds this value, a callback is invoked. For example, for a pipe with 1000 entries and a CONGESTION_PERCENTAGE of 80%, the CONGESTION_REACHED callback is invoked after the 800th entry is added.

Following the CONGESTION_REACHED callback, the application should call the pipe resize API ( resize() ). The following are optional callbacks during the resize callback:

A callback on the new number of entries allocated to the pipe

A callback on each entry that existed in the smaller pipe and is now allocated to the resized pipe

Info The pipe pointer remains the same for the application to use even after being resized.

Upon completion of the internal transfer of all entries from the small pipe to the resized pipe, a RESIZED callback is invoked.

A CONGESTION_REACHED callback is received exactly once before the RESIZED callback. Receiving another CONGESTION_REACHED only happens after calling resize() and receiving its completion with a RESIZED callback.

CONGESTION_REACHED – on the updated number of entries in the pipe (if pipe is resizable) Info Receiving a CONGESTION_REACHED callback can occur after adding a small number of entries and for moving entries from a small to resized pipe. The application must always call pipe resize after receiving the CONGESTION_REACHED callback to handle such cases.

RESIZED – upon completion of the resize operation Note Calling pipe resize returns immediately. It starts an internal process that ends later with the RESIZED callback.

NR_ENTRIES_CHANGED (optional) – on the new max number of entries in the pipe

ENTRY_RELOCATE (optional) – on each entry moved from the small pipe to the resized pipe

Set a process callback on flow configuration: Copy Copied! struct doca_flow_cfg *flow_cfg; doca_flow_cfg_create(&flow_cfg); doca_flow_cfg_set_cb_pipe_process(flow_cfg, <pipe-process-callback>); Info This informs on OP_CONGESTION_REACHED and OP_RESIZED operations when applicable. Set the following pipe attribute configurations: Copy Copied! struct doca_flow_pipe_cfg *pipe_cfg; doca_flow_pipe_cfg_create(&pipe_cfg, port); doca_flow_pipe_cfg_set_nr_entries(pipe_cfg, <initial-number-of-entries>); doca_flow_pipe_cfg_set_is_resizable(pipe_cfg, true ); doca_flow_pipe_cfg_set_congestion_level_threshold(pipe_cfg, <CONGESTION_PERCENTAGE>); doca_flow_pipe_cfg_set_user_ctx(pipe_cfg, <pipe-user-context>); # Optional for each pipe queue number to be excluded from entries process: doca_flow_pipe_cfg_set_excluded_queue(pipe_cfg, pipe_queue); Start adding entries: Copy Copied! doca_flow_pipe_add_entry() doca_flow_pipe_control_add_entry() Once the number of entries in the pipe crosses the congestion threshold, an OP_CONGESTION_REACHED operation callback is received. Mark the pipe's congestion threshold event and, upon return, call doca_flow_pipe_resize() . For this call, add the following parameters: The new threshold percentage for calculating the new size.

A callback on the new pipe size (optional): Copy Copied! doca_flow_pipe_resize_nr_entries_changed_cb nr_entries_changed_cb

A callback on the entries to be transferred to the resized pipe: Copy Copied! doca_flow_pipe_resize_entry_relocate_cb entry_relocation_cb Call doca_flow_entries_process() per queue in a loop to trigger the transfer of entries (per queue). It is relevant for both a basic pipe and a control pipe. At this phase, adding new entries to the pipe is permitted. The entries are added directly to the resized pipe and therefore do not need to be transferred. Once all entries are transferred on all queues, an OP_RESIZED operation callback is received (in the context of the last thread calling entries process. Also, at this point a new OP_CONGESTION_REACHED operation callback can be received again.

At this point calling doca_flow_entries_process() can be stopped for a control pipe. For a basic pipe an additional call is required to complete the call to doca_flow_pipe_add_entry() .

Info doca_flow_entries_process() has the following roles: Triggering entry transfer from the smaller to the bigger pipe (until an OP_RESIZED callback is received)

Follow up API on previous add_entries API (basic pipe relevance only)

If a specific queue is excluded on the resized pipe - calling entries process on that queue is not a must in order to receive the OP_RESIZED operation callback.

The OP_RESIZED callback can only be received on a pipe non-excluded queue.

There are cases where an application uses a subset of queues for rules operations such as add/remove/update. On the other hand, resizing a pipe requires calling doca_flow_entries_process() on all queues to get an OP_RESIZED operation callback. This is where the excluded queues feature becomes handy: it enables excluding the pipe's queues where rule operations do not take part of.

When calling pipe resize it is sufficient to process entries only on the non-excluded queues to get an OP_RESIZED operation callback for that pipe. Still - it is allowed to process entries of any queue (for any other pipe with this queue included).

Before and during pipe destroy it is allowed to process entries (in parallel) on all pipe's excluded queues.

However, there are limitations when configuring a pipe's excluded queues:

It is not allowed to add/remove/update entries on a pipe's excluded queues. There is no validity check or enforcement on this and it is up to the application to maintain this restriction. It is allowed to process entries of all queues at all times since doca_flow_entreis_process() API is not per pipe.

Copy Copied! struct doca_flow_pipe_cfg *pipe_cfg; doca_flow_pipe_cfg_create(&pipe_cfg, port); ... doca_error_t doca_flow_pipe_cfg_set_excluded_queue(struct doca_flow_pipe_cfg *pipe_cfg, uint16_t pipe_queue); # pipe_cfg - pointer to DOCA Flow pipe configuration struct. # pipe_queue - the pipe queue number to exclude. # Return code: # - DOCA_SUCCESS - in case of success. # - DOCA_ERROR_INVALID_VALUE - in case of invalid input.

Warning An excluded queue number must be between 0 to (max queues - 1).

It is not allowed to exclude all queues of a pipe. At least one of the pipe's queues must remain non-excluded.

In switch mode, if dev is set in struct doca_flow_port_cfg (using doca_flow_port_cfg_set_dev() ), then an internal hairpin is created for direct wire-to-wire fwd. Users may specify the hairpin configuration using mode_args . The supported options as follows:

hairpinq_num=[n] – the hairpin queue number

use_huge_mem – determines whether the Tx buffer uses hugepage memory

lock_rx_mem – locks Rx queue memory

When an entry is terminated by the user application or ages-out, the user should call the entry destroy function, doca_flow_pipe_rm_entry() . This frees the pipe entry and cancels hardware offload.

When a pipe is terminated by the user application, the user should call the pipe destroy function, doca_flow_pipe_destroy() . This destroys the pipe and the pipe entries that match it.

When all pipes of a port are terminated by the user application, the user should call the pipe flush function, doca_flow_port_pipes_flush() . This destroys all pipes and all pipe entries belonging to this port.

Warning During doca_flow_pipe_destroy() execution, the application must avoid adding/removing entries or checking for aged entries of any other pipes.





When the port is not used anymore, the user should call the port stop function, doca_flow_port_stop() . This stops the DOCA port, disables the traffic, destroys the port and frees all resources of the port.