Running VMA - NVIDIA Docs

This section shows how to run a simple network benchmarking test and compare the kernel network stack results to VMA.

Before running a user application, you must set the library libvma.so into the environment variable LD_PRELOAD. For further information, please refer to the VMA User Manual.

Example:

Copy
Copied!

            
            $ LD_PRELOAD=libvma.so sockperf server -i 11.4.3.3

Warning

If LD_PRELOAD is assigned with libvma.so without a path (as in the Example) then libvma.so is read from a known library path under your distributions’ OS otherwise it is read from the specified path.

As a result, a VMA header message should precede your running application.

Copy
Copied!

            
             VMA INFO: VMA_VERSION: 9.7.0-1 Release built on Oct 31 2022 14:45:59
 VMA INFO: Cmd Line: sockperf
 VMA INFO: OFED Version: MLNX_OFED_LINUX-5.8-1.0.1.1:
 VMA INFO: ---------------------------------------------------------------------------

The output will always show:

The VMA version
The application’s name (in the above example: Cmd Line: sockperf sr)

The appearance of the VMA header indicates that the VMA library is loaded with your application.

Running VMA using non-root Permission

Check if the LD can find the libvma library.

Copy
Copied!

            
            ld -lvma –verbose

Set the UID bit to enforce user ownership.

Copy
Copied!

            
            sudo chmod u+s /usr/lib64/libvma*
sudo chmod u+s /sbin/sysctl

Grant CAP_NET_RAW privileges to the application.

Copy
Copied!

            
            sudo setcap cap_net_raw,cap_net_admin+ep /usr/bin/sockperf

Launch the application under no root.

Copy
Copied!

            
            LD_PRELOAD=libvma.so sockperf sr --tcp -i 10.0.0.4 -p 12345
LD_PRELOAD=libvma.so sockperf pp --tcp -i 10.0.0.4 -p 12345 -t10

Benchmarking Example

Prerequisites

Install sockperf –a tool for network performance measurement
This can be done by either
- Downloading and building from source from: https://github.com/Mellanox/sockperf
- Using
  Copy
  
  Copied!
```
            
            yum install: yum install sockperf
        
```
Two machines, one serves as the server and the second as a client
- Management interfaces configured with an IP that machines can ping each other
- Physical installation of an NVIDIA® NIC in your machines

Your system must recognize the NVIDIA® NIC. To verify it recognizes it, run:

Copy
Copied!

            
            lspci | grep Mellanox

Output example:

Copy
Copied!

            
            $ lspci | grep Mellanox
82:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
82:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]

Kernel Performance

Kernel Performance Server Side

On the first machine run:

Copy
Copied!

            
            $ sockperf server -i 11.4.3.3

Server side example output:

Copy
Copied!

            
            sockperf: [SERVER] listen on:sockperf: == version #3.7-no.git ==
sockperf: [SERVER] listen on:
[ 0] IP = 11.4.3.3        PORT = 11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid 124545] using recvfrom() to block on socket(s)

Kernel Performance Client Side

On the second machine run:

Copy
Copied!

            
            $ sockperf ping-pong -t 4 -i 11.4.3.3

Client-side example output:

Copy
Copied!

            
            sockperf: == version #3.7-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
 
[ 0] IP = 11.4.3.3        PORT = 11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=4.000 sec; Warm up time=400 msec; SentMessages=307425; ReceivedMessages=307424
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=3.550 sec; SentMessages=272899; ReceivedMessages=272899
sockperf: ====> avg-lat=  6.488 (std-dev=0.396)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 6.488 usec
sockperf: Total 272899 observations; each percentile contains 2728.99 observations
sockperf: ---> <MAX> observation =   20.484
sockperf: ---> percentile 99.999 =   17.732
sockperf: ---> percentile 99.990 =    9.364
sockperf: ---> percentile 99.900 =    8.491
sockperf: ---> percentile 99.000 =    7.963
sockperf: ---> percentile 90.000 =    6.975
sockperf: ---> percentile 75.000 =    6.831
sockperf: ---> percentile 50.000 =    6.307
sockperf: ---> percentile 25.000 =    6.212
sockperf: ---> <MIN> observation =    5.887

VMA Latency

Check the VMA performance by running sockperf and using the "VMA_SPEC=latency" environment variable.

VMA Performance Server Side

On the first machine run:

Copy
Copied!

            
            $ LD_PRELOAD=libvma.so VMA_SPEC=latency sockperf server -i 11.4.3.3

Server-side example output:

Copy
Copied!

            
            VMA INFO: ------------------ VMA INFO: VMA_VERSION: VMA INFO: Cmd Line: VMA INFO: OFED Version: VMA INFO: ------------------  VMA INFO: VMA Spec VMA INFO: Log Level VMA INFO: Ring On Device Memory TX VMA INFO: Tx QP WRE VMA INFO: Tx QP WRE Batching VMA INFO: Rx QP WRE VMA INFO: Rx QP WRE Batching VMA INFO: Rx Poll Loops VMA INFO: Rx Prefetch Bytes Before Poll VMA INFO: GRO max streams VMA INFO: Select Poll (usec) VMA INFO: Select Poll OS Force VMA INFO: Select Poll OS Ratio VMA INFO: Select Skip OS VMA INFO: CQ Drain Interval (msec) VMA INFO: CQ Interrupts Moderation VMA INFO: CQ AIM Max Count VMA INFO: CQ Adaptive Moderation VMA INFO: CQ Keeps QP Full VMA INFO: TCP nodelay VMA INFO: Avoid sys-calls on tcp fd VMA INFO: Internal Thread Affinity VMA INFO: Thread mode VMA INFO: Mem Allocate type VMA INFO: ------------------ sockperf: == version sockperf: [SERVER] listen on: [ 0] sockperf: Warmup stage sockperf: [tid 9.7.0-1 Release built on Oct 31 2022 14:45:59 sockperf server -i 11.4.3.3 MLNX_OFED_LINUX-5.8-1.0.1.1: --------------------------------------------------------- Latency        [VMA_SPEC] INFO           [VMA_TRACELEVEL] 16384          [VMA_RING_DEV_MEM_TX] 256            [VMA_TX_WRE] 4              [VMA_TX_WRE_BATCHING] 256            [VMA_RX_WRE] 4              [VMA_RX_WRE_BATCHING] -1             [VMA_RX_POLL] 256            [VMA_RX_PREFETCH_BYTES_BEFORE_POLL] 0              [VMA_GRO_STREAMS_MAX] -1             [VMA_SELECT_POLL] Enabled        [VMA_SELECT_POLL_OS_FORCE] 1              [VMA_SELECT_POLL_OS_RATIO] 1              [VMA_SELECT_SKIP_OS] 100            [VMA_PROGRESS_ENGINE_INTERVAL] Disabled       [VMA_CQ_MODERATION_ENABLE] 128            [VMA_CQ_AIM_MAX_COUNT] Disabled       [VMA_CQ_AIM_INTERVAL_MSEC] Disabled       [VMA_CQ_KEEP_QP_FULL] 1              [VMA_TCP_NODELAY] Enabled        [VMA_AVOID_SYS_CALLS_ON_TCP_FD] 0              [VMA_INTERNAL_THREAD_AFFINITY] Single         [VMA_THREAD_MODE] 2 (Huge Pages) [VMA_MEM_ALLOC_TYPE] --------------------------------------------------------- #3.7-no.git == IP = 11.4.3.3        PORT = 11111 # UDP (sending a few dummy messages)... class="value">124588] using recvfrom() to block on socket(s)

VMA Performance Client Side

On the second machine run:

Copy
Copied!

            
            $ LD_PRELOAD=libvma.so VMA_SPEC=latency sockperf ping-pong -t 4 -i 11.4.3.3

Client-side example output:

Copy
Copied!

            
            VMA INFO: --------------------------------------------------------------------------- 
VMA INFO: VMA_VERSION: 9.7.0-1 Release built on Oct 31 2022 14:45:59
VMA INFO: Cmd Line: sockperf
VMA INFO: OFED Version: MLNX_OFED_LINUX-5.8-1.0.1.1:
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: VMA Spec                       Latency        [VMA_SPEC]
VMA INFO: Log Level                      INFO           [VMA_TRACELEVEL]
VMA INFO: Ring On Device Memory TX       16384          [VMA_RING_DEV_MEM_TX]
VMA INFO: Tx QP WRE                      256            [VMA_TX_WRE]
VMA INFO: Tx QP WRE Batching             4              [VMA_TX_WRE_BATCHING]
VMA INFO: Rx QP WRE                      256            [VMA_RX_WRE]
VMA INFO: Rx QP WRE Batching             4              [VMA_RX_WRE_BATCHING]
VMA INFO: Rx Poll Loops                  -1             [VMA_RX_POLL]
VMA INFO: Rx Prefetch Bytes Before Poll  256            [VMA_RX_PREFETCH_BYTES_BEFORE_POLL]
VMA INFO: GRO max streams                0              [VMA_GRO_STREAMS_MAX]
VMA INFO: Select Poll (usec)             -1             [VMA_SELECT_POLL]
VMA INFO: Select Poll OS Force           Enabled        [VMA_SELECT_POLL_OS_FORCE]
VMA INFO: Select Poll OS Ratio           1              [VMA_SELECT_POLL_OS_RATIO]
VMA INFO: Select Skip OS                 1              [VMA_SELECT_SKIP_OS]
VMA INFO: CQ Drain Interval (msec)       100            [VMA_PROGRESS_ENGINE_INTERVAL]
VMA INFO: CQ Interrupts Moderation       Disabled       [VMA_CQ_MODERATION_ENABLE]
VMA INFO: CQ AIM Max Count               128            [VMA_CQ_AIM_MAX_COUNT]
VMA INFO: CQ Adaptive Moderation         Disabled       [VMA_CQ_AIM_INTERVAL_MSEC]
VMA INFO: CQ Keeps QP Full               Disabled       [VMA_CQ_KEEP_QP_FULL]
VMA INFO: TCP nodelay                    1              [VMA_TCP_NODELAY]
VMA INFO: Avoid sys-calls on tcp fd      Enabled        [VMA_AVOID_SYS_CALLS_ON_TCP_FD]
VMA INFO: Internal Thread Affinity       0              [VMA_INTERNAL_THREAD_AFFINITY]
VMA INFO: Thread mode                    Single         [VMA_THREAD_MODE]
VMA INFO: Mem Allocate type              2 (Huge Pages) [VMA_MEM_ALLOC_TYPE]
VMA INFO: ---------------------------------------------------------------------------
sockperf: == version #3.7-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
 
[ 0] IP = 11.4.3.3        PORT = 11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=4.000 sec; Warm up time=400 msec; SentMessages=1855851; ReceivedMessages=1855850
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=3.550 sec; SentMessages=1656957; ReceivedMessages=1656957
sockperf: ====> avg-lat=  1.056 (std-dev=0.074)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 1.056 usec
sockperf: Total 1656957 observations; each percentile contains 16569.57 observations
sockperf: ---> <MAX> observation =    4.176
sockperf: ---> percentile 99.999 =    1.639
sockperf: ---> percentile 99.990 =    1.552
sockperf: ---> percentile 99.900 =    1.497
sockperf: ---> percentile 99.000 =    1.305
sockperf: ---> percentile 90.000 =    1.179
sockperf: ---> percentile 75.000 =    1.054
sockperf: ---> percentile 50.000 =    1.031
sockperf: ---> percentile 25.000 =    1.015
sockperf: ---> <MIN> observation =    0.954

Comparing Results

VMA is showing over 614.3% performance improvement comparing to kernel

Average latency:

Using Kernel 6.488 usec
Using VMA 1.056 usec

Percentile latencies:

Percentile	Kernel	VMA
Max	20.484	4.176
99.999	17.732	1.639
99.990	9.364	1.552
99.900	8.491	1.497
99.000	7.963	1.305
90.000	6.975	1.179
75.000	6.831	1.054
50.000	6.307	1.031
25.000	6.212	1.015
MIN	5.887	0.954

In order to tune your system and get best performance see section Basic Performance Tuning.

Libvma-debug.so

libvma.so is limited to DEBUG log level. In case it is required to run VMA with detailed logging higher than DEBUG level – use a library called libvma-debug.so that comes with OFED installation.

Before running your application, set the library libvma-debug.so into the environment variable LD_PRELOAD (instead of libvma.so).

Example:

Copy
Copied!

            
            $ LD_PRELOAD=libvma-debug.so sockperf server -i 11.4.3.3

Warning

libvma-debug.so is located in the same library path as libvma.so under your distribution’s OS.

For example in RHEL7.x x86_64, the libvma.so is located in /usr/lib64/libvma-debug.so.

Warning

NOTE: If you need to compile VMA with a log level higher than DEBUG run “configure” with the following parameter:

Copy
Copied!

            
            ./configure --enable-opt-log=none

See section Building VMA from Sources.

On This Page