SM Logs
SM logs include details of reported errors, all errors reported in opensm.log should be treated as indicators of IB fabric health.
SM logs path:
When only OpenSM is running without UFM: /var/log/opensm.log
When OpenSM is running with UFM on a Docker, enter the container:
docker exec -it ufm bash
the path is: /opt/ufm/files/log/opensm.log
The SM log file should include the message "SUBNET UP" if OpenSM was able to set up the subnet correctly.
The SM log file size can be changed. You can choose how often a new SM log file will be created: daily, weekly (default), monthly.
The SM log file will reach its maximum log size, or it will obey the rotational periodically order.
Modify the OpenSM log maximum file size:
vi /opt/ufm/files/conf/opensm/opensm.conf log_max_size
Modify the OpenSM log frequency rotation:
vi /etc/logrotate.d/opensm
Locate the subnet manager:
[root@fit229
~]# sminfo
sminfo: sm lid 8
sm guid 0xa088c203007cdd36
, activity count 47086
priority 15
state 3
SMINFO_MASTER
Query node description:
[root@fit229
~]# smpquery nd 8
Node Description:...................fit232 mlx5_0
Error |
Description |
TIMEOUT |
Timeout in the network, look for a bad cable |
trap128 |
The link state is changed. If this occurs too often on the same cable, make sure the cable is not corrupted |
trap131 |
A bad cable connected |
trap 144 |
Change in either link width/speed or node description |
traps 257-259 |
Bad partitions |
Example (Error trap 128):
Check the error by running the next command, if a port LinkDownedCounter is too big, it means the cable is corrupted.
for i in {1..<ports amount>};do echo Port:$i;perfquery <LID>$i | grep LinkDownedCounter;done
Apr 16 22:11:41 477567 [DA9C8640] 0x02 -> log_notice: Reporting Generic Notice type:1 num:128 (Link state change) from LID:4 GID:fe80::900a:8403:b3:c540
[root@l
-qa-203
~]# for
i in {1
..64
};do
echo Port:$i;perfquery 4
$i | grep LinkDownedCounter;done
Port:1
LinkDownedCounter:...............2
Port:2
LinkDownedCounter:...............0
Port:3
LinkDownedCounter:...............154222
Port:4
..