Flow Steering

Flow steering is a new model which steers network flows based on flow specifications to specific QPs. Those flows can be either unicast or multicast network flows. In order to maintain flexibility, domains and priorities are used. Flow steering uses a methodology of flow attribute, which is a combination of L2-L4 flow specifications, a destination QP and a priority. Flow steering rules may be inserted either by using ethtool or by using InfiniBand verbs. The verbs abstraction uses different terminology from the flow attribute (ibv_flow_attr), defined by a combination of specifications (struct ibv_flow_spec_*).

All flow steering features are enabled in the supported adapter cards. Flow Steering support in InfiniBand is determined according to the MANAGED_FLOW_STEERING flag.

Flow steering defines the concept of domain and priority. Each domain represents a user agent that can attach a flow. The domains are prioritized. A higher priority domain will always supersede a lower priority domain when their flow specifications overlap. Setting a lower priority value will result in a higher priority.
In addition to the domain, there is a priority within each of the domains. Each domain can have at most 2^12 priorities in accordance with its needs.
The following are the domains at a descending order of priority:

  • User Verbs allows a user application QP to be attached to a specified flow when using ibv_create_flow and ibv_destroy_flow verbs

  • ibv_create_flow

    Copy
    Copied!
                

    struct ibv_flow *ibv_create_flow(struct ibv_qp *qp, struct ibv_flow_attr *flow)

    Input parameters:

    • struct ibv_qp - the attached QP.

    • struct ibv_flow_attr - attaches the QP to the flow specified. The flow contains mandatory control parameters and optional L2, L3 and L4 headers. The optional headers are detected by setting the size and num_of_specs fields:
      struct ibv_flow_attr can be followed by the optional flow headers structs:

      Copy
      Copied!
                  

      struct ibv_flow_spec_eth struct ibv_flow_spec_ipv4 struct ibv_flow_spec_tcp_udp struct ibv_flow_spec_ipv6

      For further information, please refer to the ibv_create_flow man page.

  • ibv_destroy_flow

    Copy
    Copied!
                

    int ibv_destroy_flow(struct ibv_flow *flow_id)

    Input parameters:

    ibv_destroy_flow requires struct ibv_low which is the return value of ibv_create_flow in case of success.
    Output parameters:
    Returns 0 on success, or the value of errno on failure.
    For further information, please refer to the ibv_destroy_flow man page.

Ethtool domain is used to attach an RX ring, specifically its QP to a specified flow. Please refer to the most recent ethtool man page for all the ways to specify a flow.

Examples:

  • ethtool –U eth5 flow-type ether dst 00:11:22:33:44:55 loc 5 action 2
    All packets that contain the above destination MAC address are to be steered into rx-ring 2 (its underlying QP), with priority 5 (within the ethtool domain)

  • ethtool –U eth5 flow-type tcp4 src-ip 1.2.3.4 dst-port 8888 loc 5 action 2
    All packets that contain the above destination IP address and source port are to be steered into rx- ring 2. When destination MAC is not given, the user's destination MAC is filled automatically.

  • ethtool -U eth5 flow-type ether dst 00:11:22:33:44:55 vlan 45 m 0xf000 loc 5 action 2
    All packets that contain the above destination MAC address and specific VLAN are steered into ring 2. Please pay attention to the VLAN's mask 0xf000. It is required in order to add such a rule.

  • ethtool –u eth5
    Shows all of ethtool's steering rule

When configuring two rules with the same priority, the second rule will overwrite the first one, so this ethtool interface is effectively a table. Inserting Flow Steering rules in the kernel requires support from both the ethtool in the user space and in kernel (v2.6.28).

Receive Flow Steering (RFS) and Accelerated Receive Flow Steering (aRFS) are kernel features currently available in most distributions. For RFS, packets are forwarded based on the location of the application consuming the packet. aRFS boosts the speed of RFS by adding support for the hardware. By usingaRFS(unlike RFS), the packets are directed to a CPU that is local to the thread running the application.

aRFSis an in-kernel-logic responsible for load balancing between CPUs by attaching flows to CPUs that are used by flow's owner applications. This domain allows the aRFS mechanism to use the flow steering infrastructure to support the aRFS logic by implementing the ndo_rx_flow_steer, which, in turn, calls the underlying flow steering mechanism with the aRFS domain.

Configure the RFS flow table entries (globally and per core).
Note: The functionality remains disabled until explicitly configured (by default it is 0).

  • The number of entries in the global flow table is set as follows:

    Warning

    /proc/sys/net/core/rps_sock_flow_entries

  • The number of entries in the per-queue flow table are set as follows:

    Warning

    /sys/class/net/<dev>/queues/rx-<n>/rps_flow_cnt

Example:

Copy
Copied!
            

# echo 32768 > /proc/sys/net/core/rps_sock_flow_entries # NUM_CHANNELS=`ethtool -l ens6 | grep "Combined:" | tail -1 | awk '{print $2}'` # for f in `seq 0 $((NUM_CHANNELS-1))`; do echo 32768 > /sys/class/net/ens6/queues/rx-$f/rps_flow_cnt; done

The aRFS feature requires explicit configuration in order to enable it. Enabling the aRFS requires enabling the 'ntuple' flag via the ethtool.
For example, to enable ntuple for eth0, run:

Copy
Copied!
            

ethtool -K eth0 ntuple on

aRFS requires the kernel to be compiled with the CONFIG_RFS_ACCEL option. This option is available in kernels 2.6.39 and above. Furthermore, aRFS requires Device Managed Flow Steering support.

Warning

RFS cannot function if LRO is enabled. LRO can be disabled via ethtool.

The mlx_fs_dump is a python tool that prints the steering rules in a readable manner. Python v2.7 or above, as well as pip, anytree and termcolor libraries are required to be installed on the host.

Running example:

Copy
Copied!
            

./ofed_scripts/utils/mlx_fs_dump -d /dev/mst/mt4115_pciconf0 FT: 9 (level: 0x18, type: NIC_RX) +-- FG: 0x15 (MISC) |-- FTE: 0x0 (FWD) to (TIR:0x7e) out.ethtype:IPv4 out.ip_prot:UDP out.udp_dport:0x140 +-- FTE: 0x1 (FWD) to (TIR:0x7e) out.ethtype:IPv4 out.ip_prot:UDP out.udp_dport:0x13f ...

For further information on the mlx_fs_dump tool, please refer to mlx_fs_dump Community post.

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.