Performance Tuning
In order to improve performance, please make sure the HW LRO is enabled.
An armed CQ will generate an event when either of the following conditions is met:
The number of completions generated since the one which triggered the last event generation reached a set in advance number
The timer has expired and an event is pending
The timer can be set to be restarted either upon event generation or upon completion generation.
Setting the timer to be restarted upon completion generation affects the interrupt receiving rate. When receiving a burst of incoming packets, the timer will not reach its limit, therefore, the interrupt rate will be associated to the size of the packets.
In order to modify the timer restart mode, run:
#> sysctl dev.mce.<N>.conf.rx_coalesce_mode=[0
/1
/2
/3
]
0: For timer restart upon event generation.
1: For timer restart upon completion generation.
2: For timer restart upon event generation where usecs and pkts values are adaptive/dynamic, depending on the traffic type and network usage.
3: For timer restart upon completion generation where usecs and pkts values are adaptive/dynamic, depending on the traffic type and network usage.
In order to modify the number of completions generated between interrupts, run:
#> sysctl dev.mce.<N>.conf.rx_coalesce_pkts=<x>
In order to modify the time for the timer to finish, run:
#> sysctl dev.mce.<N>.conf.rx_coalesce_usecs=<x>
Note: The default values are:
dev.mce.1.conf.rx_coalesce_mode: 1 - Timer restarts upon completion generation
dev.mce.1.conf.rx_coalesce_pkts: 32 - 32 completions generate interrupts
dev.mce.1.conf.rx_coalesce_usecs: 3 - Timer count down 3 micro sec
Single NUMA Architecture
When using a server with single NUMA, no tuning is required. Also, make sure to avoid using core number 0 for interrupts and applications.
Find a CPU list:
#> sysctl -a | grep
"group level=\"2\""
-A1
<group level="2"
cache-level="2"
> <cpu count="12"
mask="fff"
>0
,1
,2
,3
,4
,5
,6
,7
,8
,9
,10
,11
</cpu>Tune the NICs to work on desirable cores
Find the device that matches the interface:
#> sysctl -a | grep mce | grep mlx dev.mce.<N>.conf.device_name: mlx5_core1 dev.mce.<N>.conf.device_name: mlx5_core0
Find the device interrupts.
vmstat -ia | grep mlx5_core0 | awk
'{print $1}'
| sed s/irq// | sed s/://
269
270
271
…Bind each interrupt to a desirable core.
cpuset -x
269
-l1
cpuset -x270
-l2
cpuset -x271
-l3
…Bind the application to the desirable core.
cpuset -l
1
-11
<app name> <sever flag> cpuset -l1
-11
<app name> <client flag> <IP>
Specifying a range of CPUs when using the cpuset command will allow the application to choose any of them. This is important for applications that execute on multiple threads.
The range argument is not supported for interrupt binding.
Dual NUMA Architecture
Find the CPU list closest to the NIC.
Find the device that matches the interface:
#> sysctl -a | grep mce | grep mlx dev.mce.
3
.conf.device_name: mlx5_core3 dev.mce.2
.conf.device_name: mlx5_core2 dev.mce.1
.conf.device_name: mlx5_core1 dev.mce.0
.conf.device_name: mlx5_core0Find the NIC's PCI location:
#> sysctl -a | grep mlx5_core.
0
| grep parent dev.mlx5_core.0
.%parent: pci3Usually, low PCI locations are closest to NUMA number 0, and high PCI locations are closest to NUMA number 1. Here is how to verify the locations:
Find the NIC's pcib by PCI location:
#> sysctl -a | grep pci.
3
.% parent dev.pci.3
.%parent: pcib3In "handle", PCI0 is the value for locations near NUMA0, and PCI1 is the value for locations near NUMA1.
Find the cores list of the closest NUMA:
#> sysctl -a | grep
"group level=\"2\""
-A1
<group level="2"
cache-level="2"
> <cpu count="12"
mask="fff"
>0
,1
,2
,3
,4
,5
,6
,7
,8
,9
,10
,11
</cpu> -- <group level="2"
cache-level="2"
> <cpu count="12"
mask="fff000"
>12
,13
,14
,15
,16
,17
,18
,19
,20
,21
,22
,23
</cpu>Note: Each list of cores refers to a different NUMA.
Tune the NICs to work on desirable cores.
- Pin both interrupts and application processes to the relevant cores.
- Find the closest NUMA to the NIC
- Find the device interruptsvmstat -ia | grep mlx5_core0 | awk
'{print $1}'
| sed s/irq// | sed s/://
304
305
306
…Bind each interrupt to a core from the closest NUMA cores list.
Note: It is best to avoid core number 0.
cpuset -x
304
-l1
cpuset -x305
-l2
cpuset -x306
-l3
...Bind the application to the closest NUMA cores list.
Note: It is best to avoid core number 0.
cpuset -l
1
-11
<app name> <sever flag> cpuset -l1
-11
<app name> <client flag> <IP>
For best performance, change CPU’s BIOS configuration to performance mode.
Due to FreeBSD internal card memory allocation mechanism on boot, it is preferred to insert the NIC to a NUMA-0 slot for max performance.