DOCA Flow Connection Tracking
This guide provides an overview and configuration instructions for DOCA Flow CT API.
DOCA Flow Connection Tracking (CT) is a 5-tuple table which supports the following:
Track 5-tuple sessions (or 6-tuple when a zone is available)
Zone based – virtual tables
Aging (i.e., removes idle connections)
Sets metadata for a connection
Bidirectional packet handling
High rate of connections per second (CPS)
The CT module makes it simple and efficient to track connections by leveraging hardware resources. The module supports both autonomous and managed mode.
DOCA Flow CT pipe handles non-encapsulated TCP and UDP packets. The CT pipe only supports forward next pipe or miss next pipe actions:
All packets matching known connection 6-tuples are forwarded to the CT's forward pipe
No-matching packets are forwarded to the miss pipe
The user application must handle packets accordingly.
The DOCA Flow CT API is built around four major parts:
CT module manipulation – configuring CT module resources
CT connection entry manipulation – adding, removing, or updating connection entries
Callbacks – handling async entry processing result
Pipe and entry statistics
Aging
Aging time is a time in seconds that sets the maximum allowed time for a session to be maintained without a packet seen. If that time elapses with no packet being detected, the session is terminated.
To support aging, a dedicated aging thread is started to poll and check counters for all connections.
Autonomous Mode
In this mode, DOCA runs multiple CT workers internally, to handle connections in parallel.
The connection's lifecycle is controlled by the connection state encapsulated in the packet and time-based aging.
CT workers establish and close connections automatically based on the connection's state stored in packet meta.
Packet meta is defined as follows:
uint32_t src : 1
; /**< Source port in multi-port E-Switch mode */
uint32_t hairpin : 1
; /**< Subject to forward using hairpin. */
uint32_t type : 2
; /**< CT packet type: New, End or Update */
uint32_t data : 28
; /**< Zone set by user or reserved after CT pipe. */
data – CT table matches packet meta (zone) and 5-tuples
type – can have the following values:
NONE – (known) if packet hit any connection rule
NEW – if new TCP or UDP connection
END – if TCP connection closed
src and hairpin – used for forwarding pipe and worker to deliver packet
Managed Mode
The application is responsible for managing the worker threads in this mode, parsing and handling the connection's lifecycle.
Managed mode uses DOCA Flow CT management APIs to create or destroy the connections.
The CT aging module notifies on aged out connections by calling callbacks.
Users can create connection rules with different a pattern, meta, or counter, for each packet direction.
Users are responsible for defining meta and mask to match and modify.
Users can create one rule of a connection first, then create another rule using API doca_flow_ct_entry_add_dir().
To enable DOCA Flow CT on the DPU, perform the following:
Enable iommu.passthrough in Linux boot commands ( or disable SMMU from the DPU BIOS):
Run:
sudo vim /etc/
default
/grubSet GRUB_CMDLINE_LINUX="iommu.passthrough=1".
Run:
sudo update-grub sudo reboot
Define huge pages (see prerequisites of DOCA Flow).
Configure DPU firmware with LAG_RESOURCE_ALLOCATION=1:
sudo mlxconfig -d <device-id> s LAG_RESOURCE_ALLOCATION=
1
InfoGain device-id from the output of the mst status -v command.
Perform power cycle on the host and Arm sides.
If working with a single port, set the DPU into e-switch mode:
sudo devlink dev eswitch set pci/<pcie-address> mode switchdev sudo devlink dev param set pci/<pcie-address> name esw_multiport value
false
cmode runtimeInfoRetrieve pcie-address from the output of the mst status -v command.
If working with two PF ports, set the DPU into multi-port e-switch mode (for the 2 PCIe devices):
sudo devlink dev param set pci/<pcie-address> name esw_multiport value
true
cmode runtimeInfoRetrieve pcie-address from the output of the mst status -v command.
DOCA Flow CT supports actions based on meta and NAT operations. Each action can be defined as either shared or non-shared.
Shared Actions
Actions that can be shared between entries. Shared actions are predefined and reused in multiple entries.
The user gets a handle per shared action created and uses this handle as a reference to the action where required.
It is user responsibility to track shared actions and to remove them when they become irrelevant.
Shared actions are defined using a control queue (see struct doca_flow_ct_cfg).
Non-shared Actions
Actions provided with their data during entry create/update.
These actions are completely managed by DOCA Flow CT and cannot be reused in multiple flows (i.e., NAT operations).
Action Sets in Pipe Creation
Users must define action sets during DOCA Flow CT pipe creation (as with any other pipe).
Only actions for meta and NAT are accepted (according to struct doca_flow_ct_actions).
During entry create/update, different actions can be provided per direction (different action content and/or different type).
Feature Enable
To enable user actions, configure the following parameters:
User action templates during DOCA Flow CT pipe creation
Maximum number of user actions (nb_user_actions on DOCA Flow CT init)
Using Actions in Autonomous Mode
Init
Configure the following parameters on doca_flow_ct_init():
nb_ctrl_queues – number of control queues for defining shared actions
nb_user_actions – maximum number of shared actions
worker_cb – callbacks required to communicate with the user
Create DOCA Flow CT Pipe
Configure actions sets on doca_flow_pipe_create().
Create Shared Actions
Use doca_flow_ct_actions_add_shared() with one of the control queues.
Shared actions can be added at any time before use.
Implement Worker Callbacks
Callbacks are called from each worker thread to acquire synchronization with the user code and on the first packet of a flow.
On doca_flow_ct_rule_pkt_cb:
Determine how the packet should be treated
If rules are required, return the actions handles to use
Using Actions in Managed Mode
Init
Configure the following parameters on doca_flow_ct_init():
nb_ctrl_queues – number of control queues for defining shared actions
nb_user_actions – maximum number of shared actions
Create DOCA Flow CT Pipe
Configure actions sets on doca_flow_pipe_create().
Create Shared Actions
Use doca_flow_ct_actions_add_shared() with one of the control queues.
Shared actions can be added at any time before use.
Add Entry
Entry can be created in one of the following ways:
Using an action handle of a predefined shared action
Using action data which is specific to the flow, not sharable (e.g., for NAT operations)
The entry can have different actions and/or different action types per direction.
Remove Entry
Non-shared actions associated with an entry are implicitly destroyed by DOCA Flow CT.
Shared actions are not destroyed. They can be used by the user until they decide to remove them.
Update Entry
Entry actions can be updated per direction. All combinations of shared/non-shared actions are applicable (e.g., update from shared to non-shared).
For the library API reference, refer to DOCA Flow and CT API documentation in the NVIDIA DOCA Library APIs.
The pkg-config (*.pc file) for the Flow CT library is included in DOCA's regular definitions :doca.
The following sections provide additional details about the library API.
enum doca_flow_ct_flags
DOCA Flow CT configuration optional flags.
Flag |
Description |
DOCA_FLOW_CT_FLAG_STATS = 1u << 0 |
Enable internal pipe counters for packet tracking purposes. Call doca_flow_pipe_dump(<ct_pipe>) to dump counter values. Each call dumps values changed. |
DOCA_FLOW_CT_FLAG_WORKER_STATS = 1u << 1, |
Enable worker thread internal debug counter periodical dump. Autonomous mode only. |
DOCA_FLOW_CT_FLAG_NO_AGING = 1u << 2, |
Disable aging |
DOCA_FLOW_CT_FLAG_SW_PKT_PARSING = 1u << 3, |
Enable CT worker software packet parsing to support VLAN, IPv6 options, or special tunnel types |
DOCA_FLOW_CT_FLAG_MANAGED = 1u << 4, |
Enable managed mode in which user application is responsible for managing packet handling, and calling the CT API to manipulate CT connection entries |
DOCA_FLOW_CT_FLAG_ASYMMETRIC = 1u << 5, |
Allows different 6-tuple table definitions for the origin and reply directions. Default to symmetric mode, uses same meta and reverse 5-tuples for reply direction. Managed mode only. |
DOCA_FLOW_CT_FLAG_ASYMMETRIC_COUNTER = 1u << 6, |
Enable different counters for the origin and reply directions. Managed mode only. |
DOCA_FLOW_CT_FLAG_NO_COUNTER = 1u << 7, |
Disable counter and aging to save aging thread CPU cycles |
DOCA_FLOW_CT_FLAG_DEFAULT_MISS = 1u << 8, |
Check TCP SYN flags and UDP in CT miss flow to identify ADD type packets. |
DOCA_FLOW_CT_FLAG_WIRE_TO_WIRE = 1u << 9, |
Hint traffic comes from uplink wire and forwards to uplink wire. |
enum doca_flow_ct_rule_opr
Options for handling flows in autonomous mode with shared actions. The decision is taken on the first flow packet.
Operation |
Description |
DOCA_FLOW_CT_RULE_OK |
Flow should be defined in the CT pipe using the required shared actions handles |
DOCA_FLOW_CT_RULE_DROP |
Flow should not be defined in the CT pipe. The packet should be dropped. |
DOCA_FLOW_CT_RULE_TX_ONLY |
Flow should not be defined in the CT pipe. The packet should be transmitted. |
struct direction_cfg
Managed mode configuration for origin or reply direction.
Field |
Description |
bool match_inner |
5-tuple match pattern applies to packet inner layer |
struct doca_flow_meta *zone_match_mask |
Mask to indicate meta field and bits to match |
struct doca_flow_meta *meta_modify_mask |
Mask to indicate meta field and bits to modify on connection packet match |
struct doca_flow_ct_worker_callbacks
Set of callbacks for using shared actions in autonomous mode.
Field |
Description |
doca_flow_ct_sync_acquire_cb worker_init |
Called at the start of a worker thread to sync with the user context |
doca_flow_ct_sync_release_cb worker_release |
Called at the end of a worker thread |
doca_flow_ct_rule_pkt_cb rule_pkt |
Called on the first packet of a flow |
struct doca_flow_ct_cfg
DOCA Flow CT configuration.
uint32_t nb_arm_queues;
uint32_t nb_ctrl_queues;
uint32_t nb_user_actions;
uint32_t nb_arm_sessions[DOCA_FLOW_CT_SESSION_MAX];
uint32_t flags;
struct doca_dev *doca_dev;
void
*ib_dev;
void
*ib_pd;
uint16_t aging_core;
uint16_t aging_query_delay_s;
doca_flow_ct_flow_log_cb flow_log_cb;
struct doca_flow_ct_aging_ops *aging_ops;
uint32_t base_core_id;
union {
/* Managed mode configuration for origin and reply direction. */
struct direction_cfg direction[2
];
/* Below fields are dedicate for autonomous mode */
struct {
uint16_t tcp_timeout_s;
uint16_t tcp_session_del_s;
uint16_t udp_timeout_s;
enum
doca_flow_tun_type tunnel_type;
uint16_t vxlan_dst_port;
enum
doca_flow_ct_hash_type hash_type;
uint32_t meta_user_bits;
uint32_t meta_action_bits;
struct doca_flow_meta *meta_zone_mask;
struct doca_flow_meta *connection_id_mask;
struct doca_flow_ct_worker_callbacks worker_cb;
};
};
Where:
Field |
Description |
uint32_t nb_arm_queues |
Number of CT queues. In autonomous mode, also the number of worker threads. |
uint32_t nb_ctrl_queues |
Number of CT control queues used for defining shared actions |
uint32_t nb_user_actions |
Maximum number of user actions supported (shared and non-shared) Minimum value is 1K * nb_ctrl_queues |
uint32_t nb_arm_sessions[DOCA_FLOW_CT_SESSION_MAX] |
Maximum number of IPv4 and IPv6 CT connections |
uint32_t flags |
CT configuration flags |
struct doca_dev *doca_dev |
DOCA device |
void *ib_dev |
Deprecated |
void *ib_pd |
Deprecated |
uint16_t aging_core |
CPU core ID for CT aging thread to bind. |
uint16_t aging_core_delay |
CT aging code delay. |
doca_flow_ct_flow_log_cb flow_log_cb |
Flow log callback function, when set |
struct doca_flow_ct_aging_ops *aging_ops |
User-defined aging logic callback functions. Fallback to default aging logic |
uint32_t base_core_id |
Base core ID for the workers |
struct direction_cfg direction |
Managed mode configuration for origin or reply direction |
uint16_t tcp_timeout_s |
TCP timeout in seconds |
uint16_t tcp_session_del_s |
Time to delay or kill TCP session after RST/FIN |
enum doca_flow_tun_type tunnel_type |
Encapsulation tunnel type |
uint16_t vxlan_dst_port |
VXLAN outer UDP destination port in big endian |
enum doca_flow_ct_hash_type hash_type |
Type of connection hash table type: NONE or SYMMETRIC_HASH |
uint32_t meta_user_bits |
User packet meta bits to be owned by the user |
uint32_t meta_action_bits |
User packet meta bits to be carried by identified connection packet |
struct doca_flow_meta *meta_zone_mask |
Mask to indicate meta field and bits saving zone information |
struct doca_flow_meta *connection_id_mask |
Mask to indicate meta field and bits for CT internal connection ID |
struct doca_flowct_worker_callbacks worker_cb |
Worker callbacks to use shared actions |
struct doca_flow_ct_actions
This structure is used in the following cases:
For defining shared actions. In this case, action data is provided by the user. The action handle is returned by DOCA Flow CT.
For defining an entry with actions. The structure can be filled with two options:
With action handle of a previously created shared action
With non-shared action data
DOCA Flow CT action structure.
enum
doca_flow_resource_type resource_type;
union {
/* Used when creating an entry with a shared action. */
uint32_t action_handle;
/* Used when creating an entry with non-shared action or when creating a shared action. */
struct {
uint32_t action_idx;
struct doca_flow_meta meta;
struct doca_flow_header_l4_port l4_port;
union {
struct doca_flow_ct_ip4 ip4;
struct doca_flow_ct_ip6 ip6;
};
} data;
};
Where:
Field |
Description |
enum doca_flow_resource_type resource_type |
Shared/non-shared action |
uint32_t action_handle |
Shared action handle |
uint32_t action_idx |
Actions template index |
struct doca_flow_meta meta |
Modify meta values |
struct doca_flow_header_l4_port l4_port |
UDP or TCP source and destination port |
struct doca_flow_ct_ip4 ip4 |
Source and destination IPv4 addresses |
struct doca_flow_ct_ip6 ip6 |
Source and destination IPv6 addresses |
This section describes DOCA Flow CT samples based on the DOCA Flow CT pipe.
The samples illustrate how to use the library API to manage TCP/UDP connections.
Running the Samples
Refer to the following documents:
NVIDIA DOCA Installation Guide for Linux for details on how to install BlueField-related software.
NVIDIA DOCA Troubleshooting Guide for any issue you may encounter with the installation, compilation, or execution of DOCA samples.
To build a given sample:
cd /opt/mellanox/doca/samples/doca_flow/flow_ct_udp meson /tmp/build ninja -C /tmp/build
InfoThe binary doca_flow_ct_udp is created under /tmp/build/.
Sample (e.g., doca_flow_ct_udp ) usage:
Usage: doca_<sample_name> [DOCA Flags] [Program Flags] DOCA Flags: -h, --help Print a help synopsis -v, --version Print program version information -l, --log-level Set the (numeric) log level
for
the program <10
=DISABLE,20
=CRITICAL,30
=ERROR,40
=WARNING,50
=INFO,60
=DEBUG,70
=TRACE> --sdk-log-level Set the SDK (numeric) log levelfor
the program <10
=DISABLE,20
=CRITICAL,30
=ERROR,40
=WARNING,50
=INFO,60
=DEBUG,70
=TRACE> -j, --json <path> Parse all command flags from an input json file Program Flags: -p, --pci_addr <PCI-ADDRESS> PCI device addressFor additional information per sample, use the -h option:
/tmp/build/<sample_name> -h
The following is a CLI example for running the samples when port 03:00.0 is configured (multi-port e-switch) as manager port:
/tmp/build/doca_<sample_name> -- -p
03
:00.0
-l60
Samples
Flow CT UDP
This sample illustrates how to create a simple UDP pipeline with a CT pipe in it.
The sample logic includes:
Initializing DOCA Flow by indicating mode_args="switch,hws" in the doca_flow_cfg struct.
Initializing DOCA Flow CT .
Starting two DOCA Flow uplink representor ports where port 0 has a special role of being a switch manager port.
InfoPorts are configured according to the parameters provided to doca_dpdk_port_probe() in the main function.
Creating a pipeline on the main port:
Building an UDP pipe to filter non-UDP packets.
Building a CT pipe to hold UDP session entries.
Building a counter pipe with an example 5-tuple entry to which non-unidentified UDP sessions should be sent.
Building a VXLAN encapsulation pipe to encapsulate all identified UDP sessions.
Building an RSS pipe from which all packets are directed to the sample main thread for parsing and processing.
Packet processing:
The first UDP packet triggers the miss flow as the CT pipe is empty.
5-tuple packet parsing is performed.
doca_flow_ct_add_entry() is called to create a hardware rule according to the parsed 5-tuple info.
The second UDP packet based on the the same 5-tuple should be sent again. Packet hits the HW rule inserted before and directed to port 0 after VXLAN encapsulation.
Reference:
/opt/mellanox/doca/samples/doca_flow/flow_ct_udp/flow_ct_udp_sample.c
/opt/mellanox/doca/samples/doca_flow/flow_ct_udp/flow_ct_udp_main.c
/opt/mellanox/doca/samples/doca_flow/flow_ct_udp/mson.build
Flow CT UDP Query
This sample illustrates how to query a Flow CT UDP session entry. The query can be done according to session direction (origin or reply). The pipeline is identical to that of the Flow CT UDP sample.
This sample adds the following logic:
Dumping port 0 information into a file at ./port_0_info.txt.
Querying UDP session hardware entry created after receiving the first UDP packet:
Origin total bytes received
Origin total packets received
Reply total bytes received
Reply total packets received
Reference:
/opt/mellanox/doca/samples/doca_flow/flow_ct_udp_query/flow_ct_udp_query_sample.c
/opt/mellanox/doca/samples/doca_flow/flow_ct_udp_query/flow_ct_udp_query_main.c
/opt/mellanox/doca/samples/doca_flow/flow_ct_udp_query/mson.build
Flow CT UDP Update
This sample illustrates how a CT entry can be updated after creation.
The pipeline is identical to that of the Flow CT UDP sample . In case of non-active UDP sessions, a relevant entry shall be updated with an aging timeout.
This sample adds the following logic:
Querying all UDP sessions for the total number of packets received in both the origin and reply directions.
Updating entry aging timeout to 2 seconds once a session is not active (i.e., no packets received on either side).
Waiting until all non-active session are aged and deleted.
Reference:
/opt/mellanox/doca/samples/doca_flow/flow_ct_udp_update/flow_ct_udp_update_sample.c
/opt/mellanox/doca/samples/doca_flow/flow_ct_udp_update/flow_ct_udp_update_main.c
/opt/mellanox/doca/samples/doca_flow/flow_ct_udp_update/mson.build
Flow CT UDP Single Match
This sample is based on the Flow CT UDP sample. The sample illustrates that a hardware entry can be created with a single match (matching performed in one direction only) in the API call doca_flow_ct_add_entry().
Flow CT Aging
This sample illustrates the use of the DOCA Flow CT aging functionality. It demonstrates how to build a pipe and add different entries with different aging times and user data.
No packets need to be sent for this sample.
The sample logic includes:
Initializing DOCA Flow by indicating mode_args="switch,hws" in the doca_flow_cfg struct.
Initializing DOCA Flow CT .
Starting two DOCA Flow uplink representor ports where port 0 has a special role of being a switch manager port.
InfoPorts are configured according to the parameters provided to doca_dpdk_port_probe() in the main function.
Building a UDP pipe to serve as the root pipe.
Building a counter pipe with an example 5-tuple entry to which CT forwards packets.
Adding 32 entries with a different 5-tuple match, different aging time (3-12 seconds), and setting user data. User data will contain the port ID, entry number, and status.
Handling aging in small intervals and removing each entry after age-out.
Running these commands until all 32 entries age out.
Reference:
/opt/mellanox/doca/samples/doca_flow/flow_ct_aging/flow_ct_aging_sample.c
/opt/mellanox/doca/samples/doca_flow/flow_ct_aging/flow_ct_aging_main.c
/opt/mellanox/doca/samples/doca_flow/flow_ct_aging/meson.build
Flow CT TCP
This sample illustrates how to manage TCP flags with CT to achieve better control over TCP sessions.
The sample expects to receive at least SYN and FIN packets.
The sample logic includes:
Initializing DOCA Flow by indicating mode_args="switch,hws" in the doca_flow_cfg struct.
Initializing DOCA Flow CT .
Starting two DOCA Flow uplink representor ports where port 0 has a special role of being a switch manager port.
InfoPorts are configured according to the parameters provided to doca_dpdk_port_probe() in the main function.
Creating a pipeline on the main port:
Building an TCP pipe to filter non-TCP packets.
Building a CT pipe to hold TCP session entries.
Building a CT miss pipe which forwards all packets to RSS pipe.
Building an RSS pipe from which all packets are directed to the sample main thread for parsing and processing.
Building a TCP flags filter pipe which identifies the TCP flag inside the packets. SYN, FIN, and RST packets are forwarded the to RSS pipe while all others are forwarded to the EGRESS pipe.
Building an EGRESS pipe to forward packets to uplink representor port 1.
Packet processing:
The first TCP packet triggers the miss flow as the CT pipe is empty.
5-tuple packet parsing is performed.
TCP flag is examined.
In case of a SYN flag, a HW entry is created.
For FIN or RST flags, the HW entry is removed and all packets are transferred to uplink representor port 1 using rte_eth_tx_burst().
From this point on, all TCP packets belonging to the above session are offloaded directly to uplink port representor 1.
Reference:
/opt/mellanox/doca/samples/doca_flow/flow_ct_tcp/flow_ct_tcp_sample.c
/opt/mellanox/doca/samples/doca_flow/flow_ct_tcp/flow_ct_tcp_main.c
/opt/mellanox/doca/samples/doca_flow/flow_ct_tcp/mson.build
Flow CT TCP Actions
This sample illustrates how a to add shared and non-shared actions to CT TCP sessions. The pipeline is identical to that of the Flow CT TCP sample.
The sample expects to receive at least SYN and FIN packets.
This sample adds a shared action on one side of the session that placed the value 1 in the packet's metadata, while on the other side of the session a non-shared action is placed. The non-shared action simply flips the order of the source-destination IP addresses and port numbers.
Reference:
/opt/mellanox/doca/samples/doca_flow/flow_ct_tcp_actions/flow_ct_tcp_actions_sample.c
/opt/mellanox/doca/samples/doca_flow/flow_ct_tcp_actions/flow_ct_tcp_actions_main.c
/opt/mellanox/doca/samples/doca_flow/flow_ct_tcp_actions/mson.build