NVIDIA Docs Hub Homepage NVIDIA Networking Accelerator Software NVIDIA Messaging Accelerator (VMA) Documentation Rev 9.8.40 LTS Basic Performance Tuning

Basic Performance Tuning

Please see the Tuning Guide and VMA Performance Tuning Guide for detailed instructions on how to optimally tune your machines for VMA performance.

VMA Tuning Parameters

Parameter	Description	Example
VMA_SPEC	Optimized performance can easily be measured by VMA predefined specification profile for latency: Latency profile spec – optimized latency on all use cases. System is tuned to keep balance between Kernel and VMA. Warning It may limit the maximum bandwidth.	Copy Copied! `LD_PRELOAD=libvma.so VMA_SPEC=latency sockperf ping-pong -t 10 -i 11.4.3.1`
VMA_RX_POLL	For blocking sockets only. It controls the number of times the ready packets can be polled on the RX path before they go to sleep (wait for interrupt in blocked mode). The recommended value for best latency is -1 (unlimited). For best latency, use -1 for infinite polling For low CPU usage use 1 for single poll Default value is 100000	Server: Copy Copied! `VMA_RX_POLL=-1 LD_PRELOAD=libvma.so taskset -c 15 sockperf sr -i 17.209.13.142` Client: Copy Copied! `VMA_RX_POLL=-1 LD_PRELOAD=libvma.so taskset -c 15 sockperf pp -i 17.209.13.142 -t 5`
VMA_INTERNAL_THREAD_AFFINITY	Controls which CPU core(s) the VMA internal thread is serviced on. The recommended configuration is to run VMA internal thread on a different core than the application but on the same NUMA node.	Server: Copy Copied! `VMA_INTERNAL_THREAD_AFFINITY=14 VMA_RX_POLL=-1 LD_PRELOAD=libvma.so taskset -c 15 sockperf sr -i 17.209.13.142` Client: Copy Copied! `VMA_INTERNAL_THREAD_AFFINITY= 14 VMA_RX_POLL=-1 LD_PRELOAD=libvma.so taskset -c 15 sockperf pp -i 17.209.13.142 -t 5`

Binding VMA to the Closest NUMA

Check which NUMA is related to your interface.

Copy
Copied!

            
            cat /sys/class/net/<interface_name>/device/numa_node

Example:

Copy
Copied!

            
            [root@r-host142 ~]# cat /sys/class/net/ens5/device/numa_node
1

The output above shows that your device is installed next to NUMA 1.

Check which CPU is related to the specific NUMA.
Copy

Copied!
```
            
            [root@r-host144 ~]# lscpu
NUMA node0 CPU(s):     0-13,28-41
NUMA node1 CPU(s):     14-27,42-55
        
```
The output above shows that:
• CPUs 0-13 & 28-41 are related to NUMA 0 • CPUs 14-27 & 42-55 are related to NUMA 1
Since we want to use NUMA 1, one of the following CPUs should be used: 14-27 & 42-55

Use the "taskset" command to run the VMA process on a specific CPU.
• Server side:

Copy
Copied!

            
            LD_PRELOAD=libvma.so taskset -c 15 sockperf sr -i < MLX IP interface >

• Client side:

Copy
Copied!

            
            LD_PRELOAD=libvma.so taskset -c 15 sockperf pp -i < IP of FIRST machine MLX interface >

In this example, we use CPU 15 that belongs to NUMA 1. You can also use "numactl - -hardware".

Configuring the BIOS

Warning

Each machine has its own BIOS parameters. It is important to implement any server manufacturer and Linux distribution tuning recommendations for lowest latency.

When configuring the BIOS, please pay attention to the following:

Enable Max performance mode.
Enable Turbo mode.
Power modes – disable C-states and P-states, do not let the CPU sleep on idle.
Hyperthreading – there is no right answer if you should have it ON or OFF.
• ON means more CPU to handle kernel tasks, so the amortized cost will be smaller for each CPU
• OFF means do not share cache with other CPUs, so cache utilization is better
If all of your system jitter is under control, it is recommended to turn is OFF, if not keep it ON.
Disable SMI interrupts.
Look for "Processor Power and Utilization Monitoring" and "Memory Pre-Failure Notification" SMIs.
The OS is not aware of these interrupts, so the only way you might be able to notice them is by reading the CPU msr register.

Please make sure to carefully read your vendor BIOS tuning guide as the configuration options differ per vendor.

On This Page

Basic Performance Tuning

VMA Tuning Parameters

Binding VMA to the Closest NUMA

Configuring the BIOS