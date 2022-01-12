



Before using any DOCA flow, it is mandatory to call DOCA flow initialization.

Copy Copied! int doca_flow_init(const struct doca_flow_cfg *cfg, struct doca_flow_error *error);



The struct doca_flow_cfg contains the following elements:

total_sessions – refers to the estimated scale of HW rules

– refers to the estimated scale of HW rules queues – the number of HW acceleration controls queues. It is expected that the same core always uses the same queue_id . In cases where multiple cores are accessing the API using the same queue_id , it is up to the application to use locks between different cores/threads.

– the number of HW acceleration controls queues. It is expected that the same core always uses the same . In cases where multiple cores are accessing the API using the same , it is up to the application to use locks between different cores/threads. is_hairpin – the fwd is a hairpin queue while it is set to true

– the fwd is a hairpin queue while it is set to true aging – aging is handled by DOCA flow while it is set to true

DOCA flow API serves as an abstraction layer API for network acceleration. The packet processing in-network function is described from ingress to egress, and therefore a pipe must be attached to the origin port. Once a packet arrives to the origin port, it will start the HW execution as defined by the DOCA API. doca_flow_port is an opaque object since the DOCA flow API is not bound to a specific packet delivery API such as DPDK. The first step is to start the DOCA flow port. The purpose of this step is to attach user application ports to the DOCA flow ports.

Copy Copied! struct doca_flow_port *doca_flow_port_start(const struct doca_flow_port_cfg *cfg, struct doca_flow_error *error);



The struct doca_flow_port_cfg contains the following elements:

port_id – chosen by the user. IDs must start with 0 and be consecutive.

– chosen by the user. IDs must start with 0 and be consecutive. type – depends on underlying API

– depends on underlying API devargs – a string containing the exact configuration needed according to the type

– a string containing the exact configuration needed according to the type priv_data_size – per port, users may define private data where application-specific info can be stored

When DPDK is used, the following configuration must be provided:

Copy Copied! enum doca_flow_port_type type = DOCA_FLOW_PORT_DPDK_BY_ID; const char *devargs = "1";





The devargs parameter points to a string that has the numeric value of the DPDK port_id in decimal. The port must be configured and started before calling this API. Mapping the DPDK port to the DOCA port is required to synchronize application ports with HW ports.

Pipe is a template that defines packet processing without adding any specific HW rule. A pipe consists of a template that includes the following elements:

Match

Monitor

Actions

Forward

The following diagram illustrates a pipe structure.

The creation phase allows the HW to efficiently build the execution pipe. After the pipe is created, specific entries can be added. Only a subset of the pipe can be used (e.g. skipping the monitor completely, just using the counter, etc).



Match is a mandatory field when creating a pipe. Using the following struct, users must define the fields that should be matched on the pipe. The struct doca_flow_match contains the following elements:

flags – match items which are no value needed

– match items which are no value needed out_src_mac – outer source MAC address

– outer source MAC address out_dst_mac – outer destination MAC address

– outer destination MAC address out_eth_type – outer Ethernet layer type

– outer Ethernet layer type vlan_id – outer VLAN ID

– outer VLAN ID out_src_ip – outer source IP address

– outer source IP address out_dst_ip – outer destination IP address

– outer destination IP address out_l4_type – outer layer 4 protocol type

– outer layer 4 protocol type out_src_port – outer layer 4 source port

– outer layer 4 source port out_dst_port – outer layer 4 destination port

– outer layer 4 destination port tun – tunnel info

– tunnel info in_src_ip – inner source IP address if tunnel is used

– inner source IP address if tunnel is used in_dst_ip – inner destination IP address if tunnel is used

– inner destination IP address if tunnel is used in_l4_type – inner layer 4 protocol type if tunnel is used

– inner layer 4 protocol type if tunnel is used in_src_port – inner layer 4 source port if tunnel is used

– inner layer 4 source port if tunnel is used in_dst_port – inner layer 4 destination port if tunnel is used

For each field, users choose whether the field is:

Ignored (wild card) – the value of the field is ignored

Constant – all entries in the pipe must have the same value for this field. Users should not put a value for each entry.

Changeable – per entry, the user must provide the value to match Note: L4 type, L3 type, and tunnel type cannot be changeable.

The match field type can be defined either implicitly or explicitly.



To match implicitly, the following considerations should be taken into account.



Field is zeroed

Pipeline has no comparison on the field

These are fields that have a constant value. For example, as shown in the following, the tunnel type is VXLAN.

Copy Copied! match.tun.type = DOCA_FLOW_TUN_VXLAN;





These fields only need to be configured once, not once per new pipeline entry.

These are fields that may change per entry. For example, the following shows an inner 5-tuple which are set with a full mask.

Copy Copied! match.in_dst_ip.ipv4_addr = 0xffffffff;





If this is the constant value required by user, then they should set zero on the field when adding a new entry.

The following is an example of a match on the VXLAN tunnel, where for each entry there is a specific IPv4 destination address, and an inner 5-tuple.

Copy Copied! static void build_underlay_overlay_match(struct doca_flow_match *match) { //outer match->out_dst_ip.ipv4_addr = 0xffffffff; match->out_l4_type = DOCA_PROTO_UDP; match->out_dst_port = DOCA_VXLAN_DEFAULT_PORT; match->tun.type = DOCA_FLOW_TUN_VXLAN; match->tun.vxlan_tun_id = 0xffffffff; //inner match->in_dst_ip.ipv4_addr = 0xffffffff; match->in_src_ip.ipv4_addr = 0xffffffff; match->in_src_ip.type = DOCA_FLOW_IP4_ADDR; match->in_l4_type = DOCA_PROTO_TCP; match->in_src_port = 0xffff; match->in_dst_port = 0xffff; }

Users may provide a mask on a match. In this case, there are two doca_flow_match items: The first will contain constant values, and the second will contain masks.



Field is zeroed

Pipeline has no comparison on the field

Copy Copied! match_mask.in_dst_ip.ipv4_addr = 0;

These are fields that have a constant value. For example, as shown in the following, the tunnel type is VXLAN and the mask should be full.

Copy Copied! match.tun.type = DOCA_FLOW_TUN_VXLAN; match_mask.tun.type = 0xffffffff;





Once a field is defined as constant, the field's value cannot be changed per entry. Users must set constant fields to zero when adding entries so as to avoid ambiguity.

These are fields that may change per entry (e.g. inner 5-tuple). Their value should be zero and the mask should be full.

Copy Copied! match.in_dst_ip.ipv4_addr = 0; match_mask.in_dst_ip.ipv4_addr = 0xffffffff;





Note that for IPs, the prefix mask can be used as well.

Similarly to setting pipe match, actions also have a template definition. The struct doca_flow_actions contains the following elements:

decap – decap while it is set to true

– decap while it is set to true mod_src_mac – modify source MAC address

– modify source MAC address mod_dst_mac – modify destination MAC address

– modify destination MAC address mod_src_ip – modify source IP address

– modify source IP address mod_dst_ip – modify destination IP address

– modify destination IP address mod_src_port – modify layer 4 source port

– modify layer 4 source port mod_dst_port – modify layer 4 destination port

– modify layer 4 destination port del_ttl – decrease TTL value while it is set to true

– decrease TTL value while it is set to true has_encap – encap while it is set to true

– encap while it is set to true encap – encap data information

Similarly to doca_flow_match in the creation phase, only the subset of actions that should be executed per packet are defined. This is done in a similar way to match, namely by classifying a field to one of the following:

Ignored field – field is zeroed, modify is not used

Constant fields – when a field must be modified per packet, but the value is the same for all packets, a one-time value on action definitions can be used

Changeable fields – fields that may have more than one possible value, and the exact values is set by the user per entry Copy Copied! match_mask.in_dst_ip.ipv4_addr = 0xffffffff;

Boolean fields – Boolean values, encap and decap are considered as constant values. It is not allowed to generate actions with encap=true and to then have an entry without an encap value.

For example:

Copy Copied! static void create_decap_inner_modify_actions(struct doca_flow_actions *actions) { actions->decap = true; actions->mod_dst_ip.ipv4_addr = 0xffffffff; }

If a policer should be used, then it is possible to have the same configuration for all policers on the pipe or to have a specific configuration per entry.

The struct doca_flow_monitor contains the following elements:

flags – indicate actions to be included

– indicate actions to be included id – meter ID

– meter ID cir – committed information rate

– committed information rate cbs – committed burst size

– committed burst size aging – aging time in seconds

– aging time in seconds user_data – aging user data input

Where:

Committed information rate (CIR) – defines maximum bandwidth

Committed burst size (CBS) – defines maximum local burst size

T(c) is the number of available tokens. For each packet where "b" equals the number of bytes, if t(c)-b≥0 the packet can continue, and tokens are consumed so that t(c)=t(c)-b . If t(c)-b<0 , the packet is dropped.

T(c) tokens are increased according to time, configured CIR, configured CBS, and packet arrival. When a packet is received prior to anything else, the t(c) tokens are filled. The number of tokens is a relative value that relies on the total time passed since the last update, but it is limited by the CBS value.

The monitor also includes the aging configuration, if the aging time is set, this entry ages out if timeout passes without any matching on the entry. User data is used to map user usage. If the user_data field is set, when the entry ages out, query API returns this user_data . If user_data is not configured by application, the aged pipe entry handle is returned. For example:

Copy Copied! static void build_entry_monitor(struct doca_flow_monitor *monitor, void *user_ctx) { monitor->flags |= DOCA_FLOW_MONITOR_AGING; monitor->aging = 10; monitor->user_data = (uint64_t)user_ctx; }





Refer to Pipe Entry Aged Query for more information.

The FORWARDING "action" is the last action in a pipe, and it directs where the packet goes next. Users may configure one of the following destinations:

Send to software (representor)

Send to wire

Jump to next pipe

Drop packets

The FORWARDING action may be set for pipe create, but it can also be unique per entry.

A pipe can be defined with constant forwarding (e.g., always send packets on a specific port). In this case, all entries will have the exact same forwarding. If forwarding is not defined when a pipe is created, users must define forwarding per entry. In this instance, pipes may have different forwarding actions.

The struct doca_flow_fwd contains the following elements:

type – indicates the forwarding type

– indicates the forwarding type rss_flags – RSS offload types

– RSS offload types rss_queues – RSS queues array

– RSS queues array num_of_queues – number of queues

– number of queues rss_mark – mark ID of each queue

– mark ID of each queue port_id – destination port ID

– destination port ID next_pipe – next pipe pointer

The type field includes the forwarding action types defined in the following enum:

DOCA_FLOW_FWD_RSS – forwards packets to RSS

– forwards packets to RSS DOCA_FLOW_FWD_PORT – forwards packets to port

– forwards packets to port DOCA_FLOW_FWD_PIPE – forwards packets to another pipe

– forwards packets to another pipe DOCA_FLOW_FWD_DROP – drops packets

The rss_flags include the RSS fields defined in the following enum:

DOCA_FLOW_RSS_IP – RSS by IP header

– RSS by IP header DOCA_FLOW_RSS_UDP – RSS by UDP header

– RSS by UDP header DOCA_FLOW_RSS_TCP – RSS by TCP header

The following is an RSS forwarding example:

Copy Copied! fwd->type = DOCA_FLOW_FWD_RSS; fwd->rss_queues = queues; fwd->rss_flags = DOCA_FLOW_RSS_IP | DOCA_FLOW_RSS_UDP; fwd->num_of_queues = 4; fwd->rss_mark = 0x1234;





Queues point to the uint16_t array that contains the queue numbers. When a port is started, the number of queues is defined, starting from zero up to the number of queues minus 1. RSS queue numbers may contain any subset of those predefined queue numbers. For a specific match, a packet may be directed to a single queue by having RSS forwarding with a single queue.

MARK is an optional parameter that may be communicated to the software. If MARK is set and the packet arrives to the software, the value can be examined using the software API. When DPDK is used, MARK is placed on the struct rte_mbuf . (See "Action: MARK" section in official DPDK documentation.) When using the Kernel, the MARK value is placed on the struct sk_buff MARK field.

The port_id is given in struct doca_flow_port_cfg .

The packet is directed to the port. In many instances the complete pipe is executed in the HW, including the forwarding of the packet back to the wire. The packet never arrives to the SW. Example code for forwarding to port:

Copy Copied! struct doca_flow_fwd *fwd = malloc(sizeof(struct doca_flow_fwd)); memset(fwd, 0, sizeof(struct doca_flow_fwd)); fwd->type = DOCA_FLOW_FWD_PORT; fwd->port_id = port_cfg->port_id;





The type of forwarding is DOCA_FLOW_FWD_PORT and the only data required is the port_id as defined in DOCA_FLOW_PORT .

Once all parameters are defined, the create function is called.

Copy Copied! struct doca_flow_pipe * doca_flow_create_pipe(const struct doca_flow_pipe_cfg *cfg, const struct doca_flow_fwd *fwd, const struct doca_flow_fwd *fwd_miss, struct doca_flow_error *error);





The return value of the function is a handle to the pipe. This handle should be given when adding entries to pipe. If a failure occurs, the function returns NULL , and the error reason and message are put in the error argument if provided by the user.

It is possible skip optional fields. For example, fwd and fwd_miss are not mandatory, and in pipe configuration some of the fields might be zeroed when not used. See Miss Pipe and Control Pipe for more information.

Once a pipe is created, a new entry can be added to it. These entries are bound to a pipe, so when a pipe is destroyed, all the entries in the pipe are removed. Please refer to section Pipe Entry for more information.

There is no priority between pipes or entries. The way that priority can be implemented is to match the highest priority first, and if a miss occurs, to jump to the next PIPE. There can be more than one PIPE on a root as long the pipes are not overlapping. If entries overlap, the priority is set according to the order of entries added. So, if two pipes have overlapping matching and PIPE1 has higher priority than PIPE2, users should add an entry to PIPE1 after any entry is added to PIPE2.

An entry is a specific instance inside of a pipe. When defining a pipe, users define match criteria (subset of fields to be matched), the type of actions to be done on matched packets, monitor, and, optionally, the FORWARDING action. When adding an entry, users should define the values that are not constant among all entries in the pipe. And if FORWARDING is not defined then that is also mandatory.

Copy Copied! struct doca_flow_pipe_entry * doca_flow_pipe_add_entry(uint16_t pipe_queue, struct doca_flow_pipe *pipe, const struct doca_flow_match *match, const struct doca_flow_actions *actions, const struct doca_flow_monitor *monitor, const struct doca_flow_fwd *fwd, struct doca_flow_error *error);

DOCA flow is designed to support concurrency in an efficient way. Since the expected rate is going to be in millions of new entries per second, it is required to use similar architecture as data path. Having a unique queue ID per core saves the DOCA engine from having to lock the data structure and enables the usage of multiple queues when interacting with HW.

Each core is expected to use its own dedicated pipe_queue number when calling doca_flow_pipe_entry . Using the same pipe_queue from different cores causes a race condition and has unexpected results. Upon success, a handle is returned. If a failure occurs, a NULL value is returned, and an error message is filled. The application can keep this handle and call remove on the entry using its handle.

Copy Copied! int doca_flow_pipe_rm_entry(uint16_t pipe_queue, struct doca_flow_pipe_entry *entry);

By default, no counter is added. If defined in monitor, a unique counter is added per entry.

Note: Having a counter per entry affects performance and should be avoided if it is not required by the application.





When a counter is present, it is possible to query the flow and get the counter's data. The struct doca_flow_query contains the following elements:

total_bytes – total bytes hit

total_ptks – total packets hit

Copy Copied! int doca_flow_query(struct doca_flow_pipe_entry *entry, struct doca_flow_query *query_stats);

This query is used to get the aged-out entries by the time quota in microseconds. The entry handle or the user_data input are returned by this API.

Since the number of flows can be very large, the query of aged flows is limited by a quota in microseconds. This means that it may return without all flows and requires the user to call it again. When the query has gone over all flows, a full cycle is done. The function returns:

> 0 – the number of aged flows filled in entries array

0 – no aged entries in current call, but cycle is not over

-1 – full cycle of this query is done

The struct doca_flow_aged_query contains the element user_data which contains the aged-out flow contexts.

Copy Copied! int doca_flow_handle_aging(uint16_t queue, uint64_t quota, struct doca_flow_aged_query *entries, int len);

Note: On the root table, the user is not allowed to enter overlapping matches. If they do so, the match behavior is unpredictable. Multiple root tables are supported, but it is the user's responsibility to make sure entries do not overlap between root tables.

To set priority between pipes, users must use miss-pipes. Miss pipes allow to look up entries associated with pipe X, and if there are no matches, to jump to pipe X+1 and perform lookup on entries associated with pipe X+1.

The following figure illustrates the HW table structure:

The first lookup is performed on the table with priority 0. If no hits are found, then jump to the next table and perform another lookup.

The way to implement miss-pipe in DOCA flow is to use miss-pipe in FWD. In struct doca_flow_fwd , the field miss_pipe signifies that, when creating a pipe, if a miss-pipe is configured then on if a packet does not match the specific pipe, steering should jump to miss_pipe . miss_pipe is defined as doca_flow_pipe and created by doca_flow_create_pipe . To separate miss_pipe and a general one, is_root is introduced in struct doca_flow_pipe_cfg . If is_root is true, it means the pipe is a root pipe executed on packet arrival. Otherwise, the pipe is miss_pipe .

Copy Copied! struct doca_flow_pipe * doca_flow_create_pipe(const struct doca_flow_pipe_cfg *cfg, const struct doca_flow_fwd *fwd, const struct doca_flow_fwd *fwd_miss, struct doca_flow_error *error)





When fwd_miss is not null, the packet that does not match the criteria is handled by next_pipe which is defined in fwd_miss .

In internal implementations of doca_flow_create_pipe , if fwd_miss is not null and the forwarding action type of miss_pipe is DOCA_FLOW_FWD_PIPE , a flow with the lowest priority is created that always jumps to the group for the next_pipe of the fwd_miss . Then the flow of next_pipe can handle the packets, or drop the packets if the fowrarding action type of miss_pipe is DOCA_FLOW_FWD_DROP .

For example, VXLAN packets are forwarded as RSS and hairpin for other packets. The miss_pipe is for the other packets (non-VXLAN packets) and the match is for general Ethernet packets. The fwd_miss is defined by miss_pipe and the type is DOCA_FLOW_FWD_PIPE . For the VXLAN pipe, it is created by doca_flow_create() and fwd_miss is introduced.

Since, in the example, the jump flow is for general Ethernet packets, it is possible that some VXLAN packets match it and cause conflicts. For example, VXLAN flow entry for ipA is created. A VXLAN packet with ipB comes in, no flow entry is added for ipB , so it hits miss_pipe and is hairpinned. A control pipe is introduced to handle the conflict. The control pipe is created without any configuration except for the port. Then the user can add different matches with different forwarding and priorities when there are conflicts.

Copy Copied! struct doca_flow_pipe * doca_flow_create_control_pipe(struct doca_flow_port *port, struct doca_flow_error *error)



The user can add a control entry using:

Copy Copied! struct doca_flow_pipe_entry* doca_flow_control_pipe_add_entry(uint16_t pipe_queue, uint8_t priority, struct doca_flow_pipe *pipe, const struct doca_flow_match *match, const struct doca_flow_match *match_mask, const struct doca_flow_fwd *fwd, struct doca_flow_error *error)





priority must be defined as higher than the lowest priority and lower than the highest one.