Appendix: Sockperf – UDP/TCP Latency and Throughput Benchmarking Tool
This appendix presents sockperf, VMA's sample application for testing latency and throughput over socket API.
Sockperf can be used natively, or with VMA acceleration.
Sockperf is an open source utility. For more general information, see https://github.com/Mellanox/sockperf.
Sockperf's advantage over other network benchmarking utilities is its focus on testing the performance of high-performance systems (as well as testing the performance of regular networking systems). In addition, sockperf covers most of the socket API call and options.
Specifically, in addition to the standard throughput tests, sockperf:
Measures latency of each discrete packet at sub-nanosecond resolution (using TSC register that counts CPU ticks with very low overhead).
Measures latency for ping-pong mode and for latency under load mode. This means that you can measure latency of single packets even under a load of millions of PPS (without waiting for reply of packet before sending a subsequent packet on time).
Enables spike analysis by providing in each run a histogram with various percentiles of the packets’ latencies (for example: median, min, max, 99% percentile, and more) in addition to average and standard deviation.
Can provide full logs containing all a packet’s tx/rx times, without affecting the benchmark itself. The logs can be further analyzed with external tools, such as MS-Excel or matplotlib.
Supports many optional settings for good coverage of socket API, while still keeping a very low overhead in the fast path to allow cleanest results.
Sockperf operates by sending packets from the client (also known as the publisher) to the server (also known as the consumer), which then sends all or some of the packets back to the client. This measured roundtrip time is the route trip time (RTT) between the two machines on a specific network path with packets of varying sizes.
The latency for a given one-way path between the two machines is the RTT divided by two.
The average RTT is calculated by summing the route trip times for all the packets that perform the round trip and then dividing the total by the number of packets.
Sockperf can test the improvement of UDP/TCP traffic latency when running applications with and without VMA.
Sockperf can work as a server (consumer) or execute under-load, ping-pong, playback and throughput tests as a client (publisher).
In addition, sockperf provides more detailed statistical information and analysis, as described in the following section.
Sockperf is installed on the VMA server at /usr/bin/sockperf. For examples of running sockperf, see:
If you want to use multicast, you must first configure the routing table to map multicast addresses to the Ethernet interface, on both client and server. (See Configuring the Routing Table for Multicast Tests).
Advanced Statistics and Analysis
In each run, sockperf presents additional advanced statistics and analysis information:
In addition to the average latency and standard deviation, sockperf presents a histogram with various percentiles, including:
50 percentile – the latency value for which 50 percent of the observations are smaller than it. The 50 percentile is also known as the median, and is different from the statistical average.
99 percentile – the latency value for which 99 percent of the observations are smaller than it (and 1 percent are higher)
These percentiles, and the other percentiles that the histogram provides, are very useful for analyzing spikes in the network traffic.
Sockperf can provide a full log of all packets’ tx and rx times by dumping all the data that it uses for calculating percentiles and building the histogram to a comma separated file. This file can be further analyzed using external tools such as Microsoft Excel or matplotlib.
All these additional calculations and reports are executed after the fast path is completed. This means that using these options has no effect on the benchmarking of the test itself. During runtime of the fast path, sockperf records txTime and rxTime of packets using the TSC CPU register, which has a negligible effect on the benchmark itself, as opposed to using the computer’s clock, which can affect benchmarking results.
If you want to use multicast, you must first configure the routing table to map multicast addresses to the Ethernet interface, on both client and server.
Example
# route add -net 224.0
.0.0
netmask 240.0
.0.0
dev eth0
where eth0 is the Ethernet interface.
You can also set the interface on runtime in sockperf:
Use "--mc-rx-if -<ip>" to set the address of the interface on which to receive multicast packets (can be different from the route table)
Use "--mc-tx-if -<ip>" to set the address of the interface on which to transmit multicast packets (can be different from the route table)
To measure latency statistics, after the test completes, sockperf calculates the route trip times (divided by two) between the client and the server for all messages, then it provides the average statistics and histogram.
UDP Ping-pong
To run UDP ping-pong:
Run the server by using:
# sockperf sr -i <server-ip>
Run the client by using:
# sockperf pp -i <server-ip> -m
64
Where -m/--msg-size is the message size in bytes (minimum default 14).
For more sockperf Ping-pong options run:
# sockperf pp –h
TCP Ping-pong
To run TCP ping-pong:
Run the server by using:
# sockperf sr -i <server-ip> --tcp
Run the client by using:
# sockperf pp -i <server-ip> --tcp –m
64
TCP Ping-pong using VMA
To run TCP ping-pong using VMA:
Run the server by using:
# VMA_SPEC=latency LD_PRELOAD=libvma.so sockperf sr -i <server-ip> --tcp
Run the client by using:
# VMA_SPEC=latency LD_PRELOAD=libvma.so sockperf pp -i <server-ip> --tcp –m
64
Where VMA_SPEC=latency is a predefined specification profile for latency.
To determine the maximum bandwidth and highest message rate for a single-process, single-threaded network application, sockperf attempts to send the maximum amount of data in a specific period of time.
UDP MC Throughput
To run UDP MC throughput:
On both the client and the server, configure the routing table to map the multicast addresses to the interface by using:
# route add -net
224.0
.0.0
netmask240.0
.0.0
dev <interface
>Run the server by using:
# sockperf sr -i <server-100g-ip>
Run the client by using:
# sockperf tp -i <server-100g-ip> -m
1472
Where -m/--msg-size is the message size in bytes (minimum default 14).
The following output is obtained:
sockperf: Total of
936977
messages sent in1.100
sec sockperf: Summary: Message Rate is851796
[msg/sec] sockperf: Summary: BandWidth is1195.759
MBps (9566.068
Mbps)
For more sockperf throughput options run:
# sockperf tp –h
UDP MC Throughput using VMA
To run UDP MC throughput:
After configuring the routing table as described in Configuring the Routing Table for Multicast Tests, run the server by using:
# LD_PRELOAD=libvma.so sockperf sr -i <server-ip>
Run the client by using:
# LD_PRELOAD=libvma.so sockperf tp -i <server-ip> -m
1472
The following output is obtained:
sockperf: Total of
4651163
messages sent in1.100
sec sockperf: Summary: Message Rate is4228326
[msg/sec] sockperf: Summary: BandWidth is5935.760
MBps (47486.083
Mbps)
UDP MC Throughput Summary
Test |
100 Gb Ethernet |
100 Gb Ethernet + VMA |
Message Rate |
851796 [msg/sec] |
4228326 [msg/sec] |
Bandwidth |
1195.759 MBps (9566.068 Mbps) |
5935.760 MBps (47486.083 Mbps) |
VMA Improvement |
4740.001 MBps (396.4%) |
You can use additional sockperf subcommands
Usage: sockperf <subcommand> [options] [args]
To display help for a specific subcommand, use:
sockperf <subcommand> --help
To display the program version number, use:
sockperf --version
Option |
Description |
For help, use |
help (h ,?) |
Display a list of supported commands. |
|
under-load (ul) |
Run sockperf client for latency under load test. |
# sockperf ul -h |
ping-pong (pp) |
Run sockperf client for latency test in ping pong mode. |
# sockperf pp -h |
playback (pb) |
Run sockperf client for latency test using playback of predefined traffic, based on timeline and message size. |
# sockperf pb -h |
throughput (tp) |
Run sockperf client for one way throughput test. |
# sockperf tp -h |
server (sr) |
Run sockperf as a server. |
# sockperf sr -h |
For additional information, see https://github.com/Mellanox/sockperf.
Additional Options
The following tables describe additional sockperf options, and their possible values.
Client Options
Short Command |
Full Command |
Description |
-h,-? |
--help,--usage |
Show the help message and exit. |
N/A |
--tcp |
Use TCP protocol (default UDP). |
-i |
--ip |
Listen on/send to IP <ip>. |
-p |
--port |
Listen on/connect to port <port> (default 11111). |
-f |
--file |
Read multiple ip+port combinations from file <file> (will use IO muxer '-F'). |
-F |
--iomux-type |
Type of multiple file descriptors handle [s|select|p|poll|e|epoll|r|recvfrom|x|socketxtreme](default epoll). |
N/A |
--timeout |
Set select/poll/epoll timeout to <msec> or -1 for infinite (default is 10 msec). |
-a |
--activity |
Measure activity by printing a '.' for the last <N> messages processed. |
-A |
--Activity |
Measure activity by printing the duration for last <N> messages processed. |
N/A |
--tcp-avoid-nodelay |
Stop/Start delivering TCP Messages Immediately (Enable/Disable Nagel). The default is Nagel Disabled except for in Throughput where the default is Nagel enabled. |
N/A |
--tcp-skip-blocking-send |
Enables non-blocking send operation (default OFF). |
N/A |
--tos |
Allows setting tos. |
N/A |
--mc-rx-if |
IP address of interface on which to receive multicast packets (can be different from the route table). |
N/A |
--mc-tx-if |
IP address of interface on which to transmit multicast packets (can be different from the route table). |
N/A |
--mc-loopback-enable |
Enable MC loopback (default disabled). |
N/A |
--mc-ttl |
Limit the lifetime of the message (default 2). |
N/A |
--mc-source-filter |
Set the address <ip, hostname> of the mulitcast messages source which is allowed to receive from. |
N/A |
--uc-reuseaddr |
Enables unicast reuse address (default disabled). |
N/A |
--lls |
Turn on LLS via socket option (value = usec to poll). |
N/A |
--buffer-size |
Set total socket receive/send buffer <size> in bytes (system defined by default). |
N/A |
--nonblocked |
Open non-blocked sockets. |
N/A |
--recv_looping_num |
Set sockperf to loop over recvfrom() until EAGAIN or <N> good received packets, -1 for infinite, must be used with --nonblocked (default 1). |
N/A |
--dontwarmup |
Do not send warm up packets on start. |
N/A |
--pre-warmup-wait |
Time to wait before sending warm up packets (seconds). |
N/A |
--vmazcopyread |
If possible use VMA's zero copy reads API (see the VMA readme). |
N/A |
--daemonize |
Run as daemon. |
N/A |
--no-rdtsc |
Do not use the register when measuring time; instead use the monotonic clock. |
N/A |
--load-vma |
Load VMA dynamically even when LD_PRELOAD was not used. |
N/A |
--rate-limit |
Use rate limit (packet-pacing). When used with VMA, it must be run with VMA_RING_ALLOCATION_LOGIC_TX mode. |
N/A |
--set-sock-accl |
Set socket acceleration before running VMA (available for some NVIDIA® systems). |
-d |
--debug |
Print extra debug information. |
Server Options
Short Command |
Full Command |
Description |
N/A |
--threads-num |
Run <N> threads on server side (requires '-f' option). |
N/A |
--cpu-affinity |
Set threads affinity to the given core IDs in the list format (see: cat /proc/cpuinfo). |
N/A |
--vmarxfiltercb |
If possible use VMA's receive path packet filter callback API (See the VMA readme). |
N/A |
--force-unicast-reply |
Force server to reply via unicast. |
N/A |
--dont-reply |
Set server to not reply to the client messages. |
-m |
--msg-size |
Set maximum message size that the server can receive <size> bytes (default 65507). |
-g |
--gap-detection |
Enable gap-detection. |
Sending Bursts
Use the "-b (--burst=<size>)" option to control the number of messages sent by the client in every burst.
SocketXtreme
sockperf v3.2 and above supports VMA socketXtreme polling mode.
In order to support socketXtreme, sockperf should be configured using --enable-vma-api parameter compiled with the compatible vma_extra.h file during compilation.
New iomux type should appear -x / --socketxtreme:
Short Command |
Full Command |
Description |
-F |
--iomux-type |
Type of multiple file descriptors handle [s|select|p|poll|e|epoll|r|recvfrom|x|socketxtreme](default epoll). |
SocketXtreme should be also enabled for VMA. For further information, please refer to Installing VMA with SocketXtreme.
In order to use socketXtreme, VMA should also be compiled using --enable-socketxtreme parameter.
socketXtreme requires forcing the Client side to bind to a specific IP address. Hence, while running UDP client with socketXtreme, running --client_ip is mandatory:
--client_ip -Force the client side to bind to a specific ip address (default
= 0
).
Use "-d (--debug)" to print extra debug information without affecting the results of the test. The debug information is printed only before or after the fast path.
If the following error is received:
sockperf error: sockperf: No messages were received from the server. Is the server down?
Perform troubleshooting as follows:
Make sure that exactly one server is running
Check the connection between the client and server
Check the routing table entries for the multicast/unicast group
Extend test duration (use the "--time" command line switch)
If you used extreme values for --mps and/or --reply-every switch, try other values or try the default values
If the following error is received, it means that Sockperf is trying to compile against VMA with no socketXtreme support:
In file included from src/Client.cpp:
32
:0
: src/IoHandlers.h: In member function'int IoSocketxtreme::waitArrival()'
: src/IoHandlers.h:421
:71
: error:'VMA_SOCKETXTREME_PACKET'
was not declared inthis
scopeif
(m_rings_vma_comps_map_itr->second->vma_comp_list[i].events & VMA_SOCKETXTREME_PACKET){ ^ src/IoHandlers.h:422
:18
: error:'struct vma_api_t'
has no member named'socketxtreme_free_vma_packets'
g_vma_api->socketxtreme_free_vma_packets(&m_rings_vma_comps_map_itr->second->vma_comp_list[i].packet,1
);There are two ways to solve this:
Configure sockperf with --disable-vma-api parameter;
orUse VMA 8.5.1 or above