Performance Tuning

Warning

In order to improve performance, please make sure the HW LRO is enabled.

An armed CQ will generate an event when either of the following conditions is met:

  • The number of completions generated since the one which triggered the last event generation reached a set in advance number

  • The timer has expired and an event is pending

The timer can be set to be restarted either upon event generation or upon completion generation.

Setting the timer to be restarted upon completion generation affects the interrupt receiving rate. When receiving a burst of incoming packets, the timer will not reach its limit, therefore, the interrupt rate will be associated to the size of the packets.

Procedure_Heading_Icon.PNG

In order to modify the timer restart mode, run:

Copy
Copied!
            

#> sysctl dev.mce.<N>.conf.rx_coalesce_mode=[0/1/2/3]

0: For timer restart upon event generation.
1: For timer restart upon completion generation.
2: For timer restart upon event generation where usecs and pkts values are adaptive/dynamic, depending on the traffic type and network usage.
3: For timer restart upon completion generation where usecs and pkts values are adaptive/dynamic, depending on the traffic type and network usage.

In order to modify the number of completions generated between interrupts, run:

Procedure_Heading_Icon.PNG

Copy
Copied!
            

#> sysctl dev.mce.<N>.conf.rx_coalesce_pkts=<x>

Procedure_Heading_Icon.PNG

In order to modify the time for the timer to finish, run:

Copy
Copied!
            

#> sysctl dev.mce.<N>.conf.rx_coalesce_usecs=<x>

Note: The default values are:

  • dev.mce.1.conf.rx_coalesce_mode: 1 - Timer restarts upon completion generation

  • dev.mce.1.conf.rx_coalesce_pkts: 32 - 32 completions generate interrupts

  • dev.mce.1.conf.rx_coalesce_usecs: 3 - Timer count down 3 micro sec

Single NUMA Architecture

When using a server with single NUMA, no tuning is required. Also, make sure to avoid using core number 0 for interrupts and applications.

  1. Find a CPU list:

    Copy
    Copied!
                

    #> sysctl -a | grep "group level=\"2\"" -A 1 <group level="2" cache-level="2"> <cpu count="12" mask="fff">0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11</cpu>

  2. Tune the NICs to work on desirable cores

    1. Find the device that matches the interface:

      Copy
      Copied!
                  

      #> sysctl -a | grep mce | grep mlx dev.mce.<N>.conf.device_name: mlx5_core1 dev.mce.<N>.conf.device_name: mlx5_core0

    2. Find the device interrupts.

      Copy
      Copied!
                  

      vmstat -ia | grep mlx5_core0 | awk '{print $1}' | sed s/irq// | sed s/:// 269 270 271

    3. Bind each interrupt to a desirable core.

      Copy
      Copied!
                  

      cpuset -x 269 -l 1 cpuset -x 270 -l 2 cpuset -x 271 -l 3

    4. Bind the application to the desirable core.

      Copy
      Copied!
                  

      cpuset -l 1-11 <app name> <sever flag> cpuset -l 1-11 <app name> <client flag> <IP>

Warning

Specifying a range of CPUs when using the cpuset command will allow the application to choose any of them. This is important for applications that execute on multiple threads.
The range argument is not supported for interrupt binding.


Dual NUMA Architecture

  1. Find the CPU list closest to the NIC.

    1. Find the device that matches the interface:

      Copy
      Copied!
                  

      #> sysctl -a | grep mce | grep mlx dev.mce.3.conf.device_name: mlx5_core3 dev.mce.2.conf.device_name: mlx5_core2 dev.mce.1.conf.device_name: mlx5_core1 dev.mce.0.conf.device_name: mlx5_core0

    2. Find the NIC's PCI location:

      Copy
      Copied!
                  

      #> sysctl -a | grep mlx5_core.0 | grep parent dev.mlx5_core.0.%parent: pci3

      Usually, low PCI locations are closest to NUMA number 0, and high PCI locations are closest to NUMA number 1. Here is how to verify the locations:

    3. Find the NIC's pcib by PCI location:

      Copy
      Copied!
                  

      #> sysctl -a | grep pci.3.% parent dev.pci.3.%parent: pcib3

      In "handle", PCI0 is the value for locations near NUMA0, and PCI1 is the value for locations near NUMA1.

    4. Find the cores list of the closest NUMA:

      Copy
      Copied!
                  

      #> sysctl -a | grep "group level=\"2\"" -A 1 <group level="2" cache-level="2"> <cpu count="12" mask="fff">0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11</cpu> -- <group level="2" cache-level="2"> <cpu count="12" mask="fff000">12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23</cpu>

      Note: Each list of cores refers to a different NUMA.

  2. Tune the NICs to work on desirable cores.
    - Pin both interrupts and application processes to the relevant cores.
    - Find the closest NUMA to the NIC
    - Find the device interrupts

    Copy
    Copied!
                

    vmstat -ia | grep mlx5_core0 | awk '{print $1}' | sed s/irq// | sed s/:// 304 305 306

    1. Bind each interrupt to a core from the closest NUMA cores list.

      Note

      : It is best to avoid core number 0.

      Copy
      Copied!
                  

      cpuset -x 304 -l 1  cpuset -x 305 -l 2  cpuset -x 306 -l 3  ...

    2. Bind the application to the closest NUMA cores list.

      Note

      : It is best to avoid core number 0.

      Copy
      Copied!
                  

      cpuset -l 1-11 <app name> <sever flag> cpuset -l 1-11 <app name> <client flag> <IP>

Warning

For best performance, change CPU’s BIOS configuration to performance mode.

Warning

Due to FreeBSD internal card memory allocation mechanism on boot, it is preferred to insert the NIC to a NUMA-0 slot for max performance.


© Copyright 2023, NVIDIA. Last updated on May 24, 2023.