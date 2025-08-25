On This Page
Running VMA
This section shows how to run a simple network benchmarking test and compare the kernel network stack results to VMA.
Before running a user application, you must set the library libvma.so into the environment variable LD_PRELOAD. For further information, please refer to the VMA User Manual.
Example:
$ LD_PRELOAD=libvma.so sockperf server -i
11.4.
3.3
If LD_PRELOAD is assigned with libvma.so without a path (as in the Example) then libvma.so is read from a known library path under your distributions’ OS otherwise it is read from the specified path.
As a result, a VMA header message should precede your running application.
VMA INFO: VMA_VERSION: X.Y.Z-R Release built on MM DD YYYY HH:mm:ss
VMA INFO: Cmd Line: sockperf server -i
11.4.
3.3
VMA INFO: OFED Version: OFED-internal-X.X-X.X.X.X:
VMA INFO: ---------------------------------------------------------------------------
The output will always show:
The VMA version
The application’s name (in the above example: Cmd Line: sockperf sr)
The appearance of the VMA header indicates that the VMA library is loaded with your application.
Check if the LD can find the libvma library.
ld -lvma –verbose
Set the UID bit to enforce user ownership.
sudo chmod u+s /usr/lib64/libvma* sudo chmod u+s /sbin/sysctl
Grant
CAP_NET_RAWprivileges to the application.
sudo setcap cap_net_raw,cap_net_admin+ep /usr/bin/sockperf
Launch the application under no root.
LD_PRELOAD=libvma.so sockperf sr --tcp -i
10.0.
0.4-p
12345LD_PRELOAD=libvma.so sockperf pp --tcp -i
10.0.
0.4-p
12345-t10
Prerequisites
Install sockperf – a tool for network performance measurement
This can be done by either
Downloading and building from source from: https://github.com/Mellanox/sockperf
Using
yum install: yum install sockperf
Two machines, one serves as the server and the second as a client
Management interfaces configured with an IP that machines can ping each other
Physical installation of an NVIDIA® NIC in your machines
Your system must recognize the NVIDIA® NIC. To verify it recognizes it, run:
lspci | grep Mellanox
Output example:
$ lspci | grep Mellanox
82:
00.0Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-
5Ex]
82:
00.1Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-
5Ex]
Kernel Performance
Kernel Performance Server Side
On the first machine run:
$ sockperf server -i
11.4.
3.3
Server side example output:
sockperf: [SERVER] listen on:sockperf: == version #
3.7-no.git ==
sockperf: [SERVER] listen on:
[
0] IP =
11.4.
3.3 PORT =
11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid
124545] using recvfrom() to block on socket(s)
Kernel Performance Client Side
On the second machine run:
$ sockperf ping-pong -t
4 -i
11.4.
3.3
Client-side example output:
sockperf: == version #
3.7-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[
0] IP =
11.4.
3.3 PORT =
11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=
4.000 sec; Warm up time=
400 msec; SentMessages=
307425; ReceivedMessages=
307424
sockperf: ========= Printing statistics
for Server No:
0
sockperf: [Valid Duration] RunTime=
3.550 sec; SentMessages=
272899; ReceivedMessages=
272899
sockperf: ====> avg-lat=
6.488 (std-dev=
0.396)
sockperf: # dropped messages =
0; # duplicated messages =
0; # out-of-order messages =
0
sockperf: Summary: Latency is
6.488 usec
sockperf: Total
272899 observations; each percentile contains
2728.99 observations
sockperf: ---> <MAX> observation =
20.484
sockperf: ---> percentile
99.999 =
17.732
sockperf: ---> percentile
99.990 =
9.364
sockperf: ---> percentile
99.900 =
8.491
sockperf: ---> percentile
99.000 =
7.963
sockperf: ---> percentile
90.000 =
6.975
sockperf: ---> percentile
75.000 =
6.831
sockperf: ---> percentile
50.000 =
6.307
sockperf: ---> percentile
25.000 =
6.212
sockperf: ---> <MIN> observation =
5.887
VMA Latency
Check the VMA performance by running sockperf and using the "VMA_SPEC=latency" environment variable.
VMA Performance Server Side
On the first machine run:
$ LD_PRELOAD=libvma.so VMA_SPEC=latency sockperf server -i
11.4.
3.3
Server-side example output:
VMA INFO: VMA_VERSION: X.Y.Z-R Release built on MM DD YYYY HH:mm:ss
VMA INFO: Cmd Line: sockperf server -i
11.4.
3.3
VMA INFO: OFED Version: OFED-internal-X.X-X.X.X.X:
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: VMA Spec Latency [VMA_SPEC]
VMA INFO: Log Level INFO [VMA_TRACELEVEL]
VMA INFO: Ring On Device Memory TX
16384 [VMA_RING_DEV_MEM_TX]
VMA INFO: Tx QP WRE
256 [VMA_TX_WRE]
VMA INFO: Tx QP WRE Batching
4 [VMA_TX_WRE_BATCHING]
VMA INFO: Rx QP WRE
256 [VMA_RX_WRE]
VMA INFO: Rx QP WRE Batching
4 [VMA_RX_WRE_BATCHING]
VMA INFO: Rx Poll Loops -
1 [VMA_RX_POLL]
VMA INFO: Rx Prefetch Bytes Before Poll
256 [VMA_RX_PREFETCH_BYTES_BEFORE_POLL]
VMA INFO: GRO max streams
0 [VMA_GRO_STREAMS_MAX]
VMA INFO: Select Poll (usec) -
1 [VMA_SELECT_POLL]
VMA INFO: Select Poll OS Force Enabled [VMA_SELECT_POLL_OS_FORCE]
VMA INFO: Select Poll OS Ratio
1 [VMA_SELECT_POLL_OS_RATIO]
VMA INFO: Select Skip OS
1 [VMA_SELECT_SKIP_OS]
VMA INFO: CQ Drain Interval (msec)
100 [VMA_PROGRESS_ENGINE_INTERVAL]
VMA INFO: CQ Interrupts Moderation Disabled [VMA_CQ_MODERATION_ENABLE]
VMA INFO: CQ AIM Max Count
128 [VMA_CQ_AIM_MAX_COUNT]
VMA INFO: CQ Adaptive Moderation Disabled [VMA_CQ_AIM_INTERVAL_MSEC]
VMA INFO: CQ Keeps QP Full Disabled [VMA_CQ_KEEP_QP_FULL]
VMA INFO: TCP nodelay
1 [VMA_TCP_NODELAY]
VMA INFO: Avoid sys-calls on tcp fd Enabled [VMA_AVOID_SYS_CALLS_ON_TCP_FD]
VMA INFO: Internal Thread Affinity
0 [VMA_INTERNAL_THREAD_AFFINITY]
VMA INFO: Thread mode Single [VMA_THREAD_MODE]
VMA INFO: Mem Allocate type
2 (Huge Pages) [VMA_MEM_ALLOC_TYPE]
VMA INFO: ---------------------------------------------------------------------------
sockperf: == version #
3.7-no.git ==
sockperf: [SERVER] listen on:
[
0] IP =
11.4.
3.3 PORT =
11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid
124588] using recvfrom() to block on socket(s)
VMA Performance Client Side
On the second machine run:
$ LD_PRELOAD=libvma.so VMA_SPEC=latency sockperf ping-pong -t
4 -i
11.4.
3.3
Client-side example output:
VMA INFO: VMA_VERSION: X.Y.Z-R Release built on MM DD YYYY HH:mm:ss
VMA INFO: Cmd Line: sockperf server -i
11.4.
3.3
VMA INFO: OFED Version: OFED-internal-X.X-X.X.X.X:
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: VMA Spec Latency [VMA_SPEC]
VMA INFO: Log Level INFO [VMA_TRACELEVEL]
VMA INFO: Ring On Device Memory TX
16384 [VMA_RING_DEV_MEM_TX]
VMA INFO: Tx QP WRE
256 [VMA_TX_WRE]
VMA INFO: Tx QP WRE Batching
4 [VMA_TX_WRE_BATCHING]
VMA INFO: Rx QP WRE
256 [VMA_RX_WRE]
VMA INFO: Rx QP WRE Batching
4 [VMA_RX_WRE_BATCHING]
VMA INFO: Rx Poll Loops -
1 [VMA_RX_POLL]
VMA INFO: Rx Prefetch Bytes Before Poll
256 [VMA_RX_PREFETCH_BYTES_BEFORE_POLL]
VMA INFO: GRO max streams
0 [VMA_GRO_STREAMS_MAX]
VMA INFO: Select Poll (usec) -
1 [VMA_SELECT_POLL]
VMA INFO: Select Poll OS Force Enabled [VMA_SELECT_POLL_OS_FORCE]
VMA INFO: Select Poll OS Ratio
1 [VMA_SELECT_POLL_OS_RATIO]
VMA INFO: Select Skip OS
1 [VMA_SELECT_SKIP_OS]
VMA INFO: CQ Drain Interval (msec)
100 [VMA_PROGRESS_ENGINE_INTERVAL]
VMA INFO: CQ Interrupts Moderation Disabled [VMA_CQ_MODERATION_ENABLE]
VMA INFO: CQ AIM Max Count
128 [VMA_CQ_AIM_MAX_COUNT]
VMA INFO: CQ Adaptive Moderation Disabled [VMA_CQ_AIM_INTERVAL_MSEC]
VMA INFO: CQ Keeps QP Full Disabled [VMA_CQ_KEEP_QP_FULL]
VMA INFO: TCP nodelay
1 [VMA_TCP_NODELAY]
VMA INFO: Avoid sys-calls on tcp fd Enabled [VMA_AVOID_SYS_CALLS_ON_TCP_FD]
VMA INFO: Internal Thread Affinity
0 [VMA_INTERNAL_THREAD_AFFINITY]
VMA INFO: Thread mode Single [VMA_THREAD_MODE]
VMA INFO: Mem Allocate type
2 (Huge Pages) [VMA_MEM_ALLOC_TYPE]
VMA INFO: ---------------------------------------------------------------------------
sockperf: == version #
3.7-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[
0] IP =
11.4.
3.3 PORT =
11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=
4.000 sec; Warm up time=
400 msec; SentMessages=
1855851; ReceivedMessages=
1855850
sockperf: ========= Printing statistics
for Server No:
0
sockperf: [Valid Duration] RunTime=
3.550 sec; SentMessages=
1656957; ReceivedMessages=
1656957
sockperf: ====> avg-lat=
1.056 (std-dev=
0.074)
sockperf: # dropped messages =
0; # duplicated messages =
0; # out-of-order messages =
0
sockperf: Summary: Latency is
1.056 usec
sockperf: Total
1656957 observations; each percentile contains
16569.57 observations
sockperf: ---> <MAX> observation =
4.176
sockperf: ---> percentile
99.999 =
1.639
sockperf: ---> percentile
99.990 =
1.552
sockperf: ---> percentile
99.900 =
1.497
sockperf: ---> percentile
99.000 =
1.305
sockperf: ---> percentile
90.000 =
1.179
sockperf: ---> percentile
75.000 =
1.054
sockperf: ---> percentile
50.000 =
1.031
sockperf: ---> percentile
25.000 =
1.015
sockperf: ---> <MIN> observation =
0.954
Comparing Results
VMA is showing over 614.3% performance improvement comparing to kernel
Average latency:
Using Kernel 6.488 usec
Using VMA 1.056 usec
Percentile latencies:
Percentile
Kernel
VMA
Max
20.484
4.176
99.999
17.732
1.639
99.990
9.364
1.552
99.900
8.491
1.497
99.000
7.963
1.305
90.000
6.975
1.179
75.000
6.831
1.054
50.000
6.307
1.031
25.000
6.212
1.015
MIN
5.887
0.954
In order to tune your system and get best performance see section Basic Performance Tuning.
Libvma-debug.so
libvma.so is limited to DEBUG log level. In case it is required to run VMA with detailed logging higher than DEBUG level – use a library called libvma-debug.so that comes with OFED installation.
Before running your application, set the library libvma-debug.so into the environment variable LD_PRELOAD (instead of libvma.so).
Example:
$ LD_PRELOAD=libvma-debug.so sockperf server -i
11.4.
3.3
libvma-debug.so is located in the same library path as libvma.so under your distribution’s OS.
For example in RHEL7.x x86_64, the libvma.so is located in /usr/lib64/libvma-debug.so.
NOTE: If you need to compile VMA with a log level higher than DEBUG run “configure” with the following parameter:
./configure --enable-opt-log=none
See section Building VMA from Sources.