Monitoring, Debugging, and Troubleshooting
Networking applications open various types of sockets.
The VMA library holds the following counters:
Per socket state and performance counters
Internal performance counters which accumulate information for select(), poll() and epoll_wait() usage by the whole application. An additional performance counter logs the CPU usage of VMA during select(), poll(), or epoll_wait() calls. VMA calculates this counter only if VMA_CPU_USAGE_STATS parameter is enabled, otherwise this counter is not in use and displays the default value as zero.
VMA internal CQ performance counters
VMA internal RING performance counters
VMA internal Buffer Pool performance counters
Use the included vma_stats utility to view the per-socket information and performance counters during runtime.
Note: For TCP connections, vma_stats shows only offloaded traffic, and not "os traffic."
Usage:
#vma_stats [-p pid] [-k directory] [-v view] [-d details] [-i interval]
The following table lists the basic and additional vma_stats utility options.
Parameter Name |
Description |
Values |
-p, --pid |
<pid> |
Shows VMA statistics for a process with pid: <pid>. |
-k, --directory |
<directory> |
Sets shared memory directory path to <directory> |
-n, --name |
<application> |
Shows VMA statistics for application: <application> |
-f, --find_pid |
Finds PID and shows statistics for the VMA instance running (default). |
|
-F, --forbid_clean |
When you set this flag to inactive, shared objects (files) are not removed. |
|
-i, --interval |
<n> |
Prints a report every <n> seconds. Default: 1 sec |
-c, --cycles |
<n> |
Do <n> report print cycles and exit, use 0 value for infinite. Default: 0 |
-v, --view |
<1|2|3|4|5> |
Sets the view type:
Shows netstat like view of all sockets. |
-d, --details |
<1|2> |
Sets the details mode:
Show deltas. |
-S, --fd_dump |
<fd> [<level>] |
Dumps statistics for fd number <fd> using log level <level>. Use 0 value for all open fds. |
-z, --zero |
Zero counters. |
|
-l, --log_level |
<level> |
Sets the VMA log level to <level> (1 <= level <= 7). |
-D, --details_level |
<level> |
Sets the VMA log detail level to <level> (0 <= level <= 3). |
-s, --sockets |
<list|range> |
Logs only sockets that match <list> or <range> format: 4-16 or 1,9 (or combination). |
-V, --version |
Prints the version number. |
|
-h, --help |
Prints a help message. |
Examples
The following sections contain examples of the vma_stats utility.
Example 1
Description
The following example demonstrates basic use of the vma_stats utility.
Command Line
#vma_stats –p <pid>
If there is only a single process running over VMA, it is not necessary to use the –p option, since vma_stats will automatically recognize the process.
Output
If no process with a suitable pid is running over the VMA, the output is:
vmastat: Failed to identify process...
If an appropriate process was found, the output is:
fd ------------ total offloaded ------------- ----- total os ------
pkt Kbyte eagain error poll% pkt Kbyte error
14
Rx: 140479898
274374
0
0
100.0
0
0
0
Tx: 140479902
274502
0
0
0
0
0
--------------------------------------------------------------------------
Output Analysis
A single socket with user fd=14 was created
Received 140479898 packets, 274374 Kilobytes via the socket
Transmitted 140479898 packets, 274374 Kilobytes via the socket
All the traffic was offloaded. No packets were transmitted or received via the OS.
There were no missed Rx polls (see VMA_RX_POLL). This implies that the receiving thread did not enter a blocked state, and therefore there was no context switch to hurt latency.
There are no transmission or reception errors on this socket
Example 2
Description
vma_stats presents not only cumulative statistics, but also enables you to view deltas of VMA counter updates. This example demonstrates the use of the "deltas" mode.
Command Line
#vma_stats –p <pid> -d 2
Output
fd --------------- offloaded ---------------- ---------- os ----------
pkt/s Kbyte/s eagain/s error/s poll% pkt/s Kbyte/s error/s
15
Rx: 15186
29
0
0
0.0
0
0
0
Tx: 15186
29
0
0
0
0
0
19
Rx: 15186
29
0
0
0.0
0
0
0
Tx: 15186
29
0
0
0
0
0
23
Rx: 0
0
0
0
0.0
15185
22
0
Tx: 0
0
0
0
15185
22
0
select() Rx Ready:15185
/30372
[os/offload]
Timeouts:0
Errors:0
Poll:100.00
% Polling CPU:70
%
Output Analysis
Three sockets were created (fds: 15, 19, and 23)
Received 15186 packets, 29 Kilobytes during the last second via fds: 15 and 19
Transmitted 15186 packets, 29 Kbytes during the last second via fds: 15 and 19
Not all the traffic was offloaded, as fd 23: 15185 packets, 22 KBytes were transmitted and received via the OS. This means that fd 23 was used for unicast traffic.
No transmission or reception errors were detected on any socket
The application used select for I/O multiplexing
45557 packets were placed in socket ready queues (over the course of the last second): 30372 of them offloaded (15186 via fd 15 and 15186 via fd 19), and 15185 were received via the OS (through fd 23)
There were no missed Select polls (see VMA_SELECT_POLL). This implies that the receiving thread did not enter a blocked state. Thus, there was no context switch to hurt latency.
The CPU usage in the select call is 70%
You can use this information to calculate the division of CPU usage between VMA and the application. For example when the CPU usage is 100%, 70% is used by VMA for polling the hardware, and the remaining 30% is used for processing the data by the application.
Example 3
Description
This example presents the most detailed vma_stats output.
Command Line
#vma_stats –p <pid> -v 3
–d 2
Output
======================================================
Fd=[14
]
- Blocked, MC Loop Enabled
- Bound IF = [0.0
.0.0
:11111
]
- Member of = [224.7
.7.7
]
Rx Offload: 1128530
/ 786133
/ 0
/ 0
[kilobytes/packets/eagains/errors]
Rx byte
: cur 1470
/ max 23520
/ dropped/s 0
/ limit 16777216
Rx pkt : cur 1
/ max 16
/ dropped/s 0
Rx poll: 10
/ 276077
(100.00
%) [miss/hit]
======================================================
CQ=[0
]
Packets dropped: 0
Packets queue len: 0
Drained max: 511
Buffer pool size: 500
======================================================
======================================================
RING_ETH=[0
]
Rx Offload: 1192953
/ 786133
[kilobytes/packets]
Interrupts: 786137
/ 78613
[requests/received]
Moderation: 10
/ 181
[frame count/usec period]
======================================================
======================================================
BUFFER_POOL(RX)=[0
]
Size: 168000
No buffers error: 0
======================================================
BUFFER_POOL(TX)=[1
]
Size: 199488
No buffers error: 0
======================================================
Output Analysis
A single socket with user fd=14 was created
The socket is a member of multicast group: 224.7.7.7
Received 786133 packets, 1128530 Kilobytes via the socket during the last second
No transmitted data
All the traffic was offloaded. No packets were transmitted or received via the OS
There were almost no missed Rx polls (see VMA_RX_POLL)
There were no transmission or reception errors on this socket
The sockets receive buffer size is 16777216 Bytes
There were no dropped packets caused by the socket receive buffer limit (see VMA_RX_BYTES_MIN)
Currently, one packet of 1470 Bytes is located in the socket receive queue
The maximum number of packets ever located, simultaneously, in the sockets receive queue is 16
No packets were dropped by the CQ
No packets in the CQ ready queue (packets which were drained from the CQ and are waiting to be processed by the upper layers)
The maximum number of packets drained from the CQ during a single drain cycle is 511 (see VMA_CQ_DRAIN_WCE_MAX)
The RING_ETH received 786133 packets during this period
The RING_ETH received 1192953 kilo bytes during this period. This includes headers bytes.
786137 interrupts were requested by the ring during this period
78613 interrupts were intercepted by the ring during this period
The moderation engine was set to trigger an interrupt for every 10 packets and with maximum time of 181 usecs
There were no retransmissions
The current available buffers in the RX pool is 168000
The current available buffers in the TX pool is 199488
There were no buffer requests that failed (no buffer errors)
Example 4
Description
This example demonstrates how you can get multicast group membership information via vma_stats.
Command Line
#vma_stats –p <pid> -v 4
Output
VMA Group Membership Information
Group fd number
------------------------------
[224.4
.1.3
] 15
[224.4
.1.2
] 19
Example 5
Description
This is an example of the “netstat like” view of vma_stats (-v 5).
Output
Proto Offloaded Local Address Foreign Address State Inode PID
udp Yes 0.0
.0.0
:44522
0.0
.0.0
:* 733679757
1576
tcp Yes 0.0
.0.0
:11111
0.0
.0.0
:* LISTEN 733679919
1618
Output Analysis
Two processes are running VMA
PID 1576 has one UDP socket bounded to all interfaces on port 44522
PID 1618 has one TCP listener socket bounded to all interfaces on port 11111
Example 6
Description
This is an example of a log of socket performance counters along with an explanation of the results (using VMA_STATS_FILE parameter).
Output
VMA: [fd=10
] Tx Offload: 455
/ 233020
/ 0
[kilobytes/packets/errors]
VMA: [fd=10
] Tx OS info: 0
/ 0
/ 0
[kilobytes/packets/errors]
VMA: [fd=10
] Rx Offload: 455
/ 233020
/ 0
[kilobytes/packets/errors]
VMA: [fd=10
] Rx OS info: 0
/ 0
/ 0
[kilobytes/packets/errors]
VMA: [fd=10
] Rx byte
: max 200
/ dropped 0
(0.00
%) / limit 2000000
VMA: [fd=10
] Rx pkt : max 1
/ dropped 0
(0.00
%)
VMA: [fd=10
] Rx poll: 0
/ 233020
(100.00
%) [miss/hit]
Output Analysis
No transmission or reception errors occurred on this socket (user fd=10).
All the traffic was offloaded. No packets were transmitted or received via the OS.
There were practically no missed Rx polls (see VMA_RX_POLL and VMA_SELECT_POLL). This implies that the receiving thread did not enter a blocked state. Thus, there was no context switch to hurt latency.
There were no dropped packets caused by the socket receive buffer limit (see VMA_RX_BYTES_MIN). A single socket with user fd=14 was created.
Example 7
Description
This is an example of vma_stats fd dump utility of established TCP socket using log level = info.
Command Line
#vma_stats –-fd_dump 17
info
Output
VMA INFO: ============ DUMPING FD 17
STATISTICS ============
VMA INFO: ==================== SOCKET FD ===================
VMA INFO: Fd number : 17
VMA INFO: Bind info : 22.0
.0.4
:58795
VMA INFO: Connection info : 22.0
.0.3
:6666
VMA INFO: Protocol : PROTO_TCP
VMA INFO: Is closed : false
VMA INFO: Is blocking : true
VMA INFO: Rx reuse buffer pending : false
VMA INFO: Rx reuse buffer postponed : false
VMA INFO: Is offloaded : true
VMA INFO: Tx Offload : 12374
/ 905105
/ 0
/ 0
[kilobytes/packets/drops/errors]
VMA INFO: Rx Offload : 12374
/ 905104
/ 0
/ 0
[kilobytes/packets/eagains/errors]
VMA INFO: Rx byte
: max 14
/ dropped 0
(0.00
%) / limit 0
VMA INFO: Rx pkt : max 1
/ dropped 0
(0.00
%)
VMA INFO: Rx poll : 0
/ 905109
(100.00
%) [miss/hit]
VMA INFO: Socket state : TCP_SOCK_CONNECTED_RDWR
VMA INFO: Connection state : TCP_CONN_CONNECTED
VMA INFO: Receive buffer : m_rcvbuff_current 0
, m_rcvbuff_max 87380
, m_rcvbuff_non_tcp_recved 0
VMA INFO: Rx lists size : m_rx_pkt_ready_list 0
, m_rx_ctl_packets_list 0
, m_rx_ctl_reuse_list 0
VMA INFO: PCB state : ESTABLISHED
VMA INFO: PCB flags : 0x140
VMA INFO: Segment size : mss 1460
, advtsd_mss 1460
VMA INFO: Window scaling : ENABLED, rcv_scale 7
, snd_scale 7
VMA INFO: Receive window : rcv_wnd 87380
(682
), rcv_ann_wnd 87240
(681
), rcv_wnd_max 87380
(682
), rcv_wnd_max_desired 87380
(682
)
VMA INFO: Send window : snd_wnd 87168
(681
), snd_wnd_max 8388480
(65535
)
VMA INFO: Congestion : cwnd 1662014
VMA INFO: Receiver data : rcv_nxt 12678090
, rcv_ann_right_edge 12765330
VMA INFO: Sender data : snd_nxt 12678080
, snd_wl1 12678076
, snd_wl2 12678066
VMA INFO: Send buffer : snd_buf 255986
, max_snd_buff 256000
VMA INFO: Retransmission : rtime 0
, rto 3
, nrtx 0
VMA INFO: RTT variables : rttest 38
, rtseq 12678066
VMA INFO: First unsent : NULL
VMA INFO: First unacked : seqno 12678066
, len 14
, seqno + len 12678080
VMA INFO: Last unacked : seqno 12678066
, len 14
, seqno + len 12678080
VMA INFO: Acknowledge : lastack 12678066
VMA INFO: ==================================================
VMA INFO: ==================================================
Output Analysis
Fd 17 is a descriptor of established TCP socket (22.0.0.4:58795 -> 22.0.0.3:6666)
Fd 17 is offloaded by VMA
The current usage of the receive buffer is 0 bytes, while the max possible is 87380
The connection (PCB) flags are TF_WND_SCALE and TF_NODELAY (PCB0x140)
Window scaling is enabled, receive and send scales equal 7
Congestion windows equal 1662014
Unsent queue is empty
There is a single packet of 14 bytes in the unacked queue (seqno 12678066)
The last acknowledge sequence number is 12678066
Additional information about the values can be found in the VMA’s wiki page.
VMA Logs
Use the VMA logs in order to trace VMA operations. VMA logs can be controlled by the VMA_TRACELEVEL variable. This variable's default value is 3, meaning that the only logs obtained are those with severity of PANIC, ERROR, and WARNING.
You can increase the VMA_TRACELEVEL variable value up to 6 (as described in VMA Configuration Parameters) to see more information about each thread's operation. Use the VMA_LOG_DETAILS=3 to add a time stamp to each log line. This can help to check the time difference between different events written to the log.
Use the VMA_LOG_FILE=/tmp/my_file.log to save the daily events. It is recommended to check these logs for any VMA warnings and errors. Refer to the Troubleshooting section to help resolve the different issues in the log.
VMA will replace a single '%d' appearing in the log file name with the pid of the process loaded with VMA. This can help in running multiple instances of VMA each with its own log file name.
When VMA_LOG_COLORS is enabled, VMA uses a color scheme when logging: Red for errors and warnings, and dim for low level debugs.
Use the VMA_HANDLE_SIGSEGV to print a backtrace if a segmentation fault occurs.
Ethernet Counters
Look at the Ethernet counters (by using the ifconfig command) to understand whether the traffic is passing through the kernel or through the VMA (Rx and Tx).
tcpdump
For tcpdump to capture offloaded traffic (on ConnectX-4 and above), please follow instructions in section Offloaded Traffic Sniffer in the MLNX_OFED User Manual.
NIC Counters
Look at the NIC counters to monitor HW interface level packets received and sent, drops, errors, and other useful information.
ls /sys/class
/net/eth2/statistics/
Peer notification service handles TCP half-open connections where one side discovers the connection was lost but the other side still see it as active.
The peer-notification daemon is started at system initialization or manually under super user permissions.
The daemon collects information about TCP connections from all the running VMA processes. Upon VMA process termination (identified as causing TCP half open connection) the daemon notifies the peers (by sending Reset packets) in order to let them delete the TCP connections on their side.
This section lists problems that can occur when using VMA, and describes solutions for these problems.
High Log Level
VMA: WARNING: *************************************************************** VMA: WARNING: *VMA is currently configured with high log level* VMA: WARNING: *Application performance will decrease in
this
log level!* VMA: WARNING: *This log level is recommendedfor
debugging purposes only* VMA: WARNING: ***************************************************************This warning message indicates that you are using VMA with a high log level.
The VMA_TRACELEVEL variable value is set to 4 or more, which is good for troubleshooting but not for live runs or performance measurements.
Solution: Set VMA_TRACELEVEL to its default value 3.
On running an application with VMA, the following error is reported:
ERROR: ld.so: object
'libvma.so'
from LD_PRELOAD cannot be preloaded: ignored.Solution: Check that libvma is properly installed, and that libvma.so is located in /usr/lib (or in /usr/lib64, for 64-bit machines).
On attempting to install vma rpm, the following error is reported:
#rpm –ivh libvma-w.x.y-z.rpm error: can't create transaction lock
Solution: Install the rpm with privileged user (root).
The following warning is reported:
VMA: WARNING: ********************************************************** VMA: WARNING: Your current max locked memory is:
33554432
. Please change it to unlimited. VMA: WARNING: Setthis
user'sdefault
to `ulimit -l unlimited`. VMA: WARNING: Read more aboutthis
issue in the VMA's User Manual. VMA: WARNING: **********************************************************Solution: When working with root, increase the maximum locked memory to 'unlimited' by using the following command:
#ulimit -l unlimited
When working as a non-privileged user, ask your administrator to increase the maximum locked memory to unlimited.
Lack of huge page resources in the system.The following warning is reported:
VMA WARNING: ************************************************************** VMA WARNING: * NO IMMEDIATE ACTION NEEDED! VMA WARNING: * Not enough hugepage resources
for
VMA memory allocation. VMA WARNING: * VMA willcontinue
working with regular memory allocation. VMA INFO: * Optional: VMA INFO: *1
. Switch to a different memory allocation type VMA INFO: * (VMA_MEM_ALLOC_TYPE!=2
) VMA INFO: *2
. Restart process after increasing the number of VMA INFO: * hugepages resources in the system: VMA INFO: *"echo 1000000000 > /proc/sys/kernel/shmmax"
VMA INFO: *"echo 800 > /proc/sys/vm/nr_hugepages"
VMA WARNING: * Please refer to the memory allocation section in the VMA's VMA WARNING: * User Manualfor
more information VMA WARNING: ***************************************************************This warning message means that you are using VMA with huge page memory allocation enabled (VMA_MEM_ALLOC_TYPE=2), but not enough huge page resources are available in the system. VMA will use contiguous pages instead.
Note that VMA_MEM_ALLOC_TYPE= 1 is not supported while working with Microsoft hypervisor. In this case – please use VMA_MEM_ALLOC_TYPE= 0 (malloc).
If you want VMA to take full advantage of the performance benefits of huge pages, restart the application after adding more huge page resources to your system similar to the details in the warning message above, or try to free unused huge page shared memory segments with the script below.
echo
1000000000
> /proc/sys/kernel/shmmax echo800
> /proc/sys/vm/nr_hugepagesIf you are running multiple instances of your application loaded with VMA, you will probably need to increase the values used in the above example.
WarningCheck that your host machine has enough free memory after allocating the huge page resources for VMA. Low system memory resources may cause your system to hang.
NoteUse "ipcs -m" and "ipcrm -m shmid" to check and clean unused shared memory segments.
Use the following script to release VMA unused huge page resources:
for
shmid in `ipcs -m | grep0x00000000
| awk'{print $2}'
`;do
echo'Clearing'
$shmid; ipcrm -m $shmid; done;Wrong ARP resolution when multiple ports are on the same network.
When two (or more) ports are configured on the same network (e.g. 192.168.1.1/24 and 192.168.1.2/24) VMA will only detect the MAC address of one of the interfaces. This will result in incorrect ARP resolution.
This is due to the way Linux handles ARP responses in this configuration. By default, Linux returns the same MAC address for both IPs. This behavior is called “ARP Flux”.
To fix this, it is required to change some of the interface’s settings:
$ sysctl -w net.ipv4.conf.[DEVICE].arp_announce=
1
$ sysctl -w net.ipv4.conf.[DEVICE].arp_ignore=2
$ sysctl -w net.ipv4.conf.[DEVICE].rp_filter=0
To verify the issue is resolved, clear the ARP tables on a different server that is on the same network and use the arping utility to verify that each IP reports its own MAC address correctly:
$ ip -s neigh flush all # clear the arp table on the remote server $ arping -b -I ens3f1
192.168
.1.1
ARPING192.168
.1.1
from192.168
.1.5
ens3f0 Unicast reply from192.168
.1.1
[24
:8A:07
:9A:16
:0A]0
.548ms $ arping -b -I ens3f1192.168
.1.2
ARPING192.168
.1.2
from192.168
.1.5
ens3f0 Unicast reply from192.168
.1.2
[24
:8A:07
:9A:16
:1A]0
.548msVMA process cannot establish connection with daemon (vmad) in Microsoft hypervisor environment.
When working with Microsoft Hypervisor, VMA daemon must be enabled in order to submit Traffic Control (TC) rules which will offload the traffic to the TAP device in case of plug-out events.
The following warning is reported during VMA startup:
VMA WARNING: *********************************************************** VMA WARNING: * Can not establish connection with the daemon (vmad). * VMA WARNING: * UDP/TCP connections are likely to be limited. * VMA WARNING: ***********************************************************
The following warning is reported during any connection establishment/termination:
VMA WARNING: ring_tap[
0x1efc910
]:135
:attach_flow() Add TC rule failed with error=-19
To fix this, run “vmad” as root.
VMA cannot offload traffic when RoCE LAG is enabled.
RoCE LAG is a feature meant for mimicking Ethernet bonding for IB devices and is available for dual port cards only. When in RoCE LAG mode, instead of having an IB device per physical port (for example mlx5_0 and mlx5_1), only one IB device is present for both ports.
The following warning appears upon VMA startup:
VMA WARNING: **************************************************** VMA WARNING: * Interface bond0 will not be offloaded. VMA WARNING: * VMA cannot offload the device
while
RoCE LAG is enabled. VMA WARNING: * Please refer to VMA Release Notesfor
more info VMA WARNING: ****************************************************** VMA WARNING: ****************************************************** VMA WARNING: * Bond bond0 will not be offloaded due to problem with its slaves. VMA WARNING: * Check warning messagesfor
more information. VMA WARNING: ****************************************************RoCE LAG should be disabled in order for VMA to be able to offload traffic.
ConnectX-4 and above HCAs with MLNX_OFED print the following warning with instructions on how to disable RoCE LAG:
VMA WARNING: ****************************************************** VMA WARNING: * Interface bond1 will not be offloaded. VMA WARNING: * VMA cannot offload the device
while
RoCE LAG is enabled. VMA WARNING: * In order to disable RoCE LAG please use: VMA WARNING: * echo0
> /sys/class
/net/ens4f1/device/roce_lag_enable VMA WARNING: ****************************************************** VMA WARNING: ****************************************************** VMA WARNING: * Bond bond1 will not be offloaded due to problem with its slaves. VMA WARNING: * Check warning messagesfor
more information. VMA WARNING: ******************************************************Device memory programming is not supported on VMs that lack Blue Flame support.
VMA will explicitly disable Device Memory capability if it detects Blue Flame support is missing on the node on which user application was launched using VMA. The following warning message will appear:VMA: WARNING: Device Memory functionality is not used on devices w/o Blue Flame support.