Appendix - Supported Port Counters and Events
Port counters and events are available in the following views:
Events and Port Counters area, at the bottom of the UFM window
Error window (Error tab) in the Manage Devices tab
In the New Monitoring Session window, in the Monitor tab, when clicking Create New Session
Event Log in the Log tab (click Show Event Log)
The following tables list and describe the port counters and events currently supported:
InfiniBand Port Counters
Calculated Port Counters
InfiniBand Port Counters |
|
Counter |
Description |
Xmit Data (in bytes) |
Total number of data octets, divided by 4, transmitted on all VLs from the port, including all octets between (and not including) the start of packet delimiter and the VCRC, and may include packets containing errors. All link packets are excluded. Results are reported as a multiple of four octets. |
Rcv Data (in bytes) |
Total number of data octets, divided by 4, received on all VLs at the port. All octets between (and not including) the start of packet delimiter and the VCRC are excluded and may include packets containing errors. All link packets are excluded. When the received packet length exceeds the maximum allowed packet length specified in C7-45: the counter may include all data octets exceeding this limit. Results are reported as a multiple of four octets. |
Xmit Packets |
Total number of packets transmitted on all VLs from the port, including packets with errors and excluding link packets. |
Rcv Packets |
Total number of packets, including packets containing errors and excluding link packets, received from all VLs on the port. |
Rcv Errors |
Total number of packets containing errors that were received on the port including:
|
Xmit Discards |
Total number of outbound packets discarded by the port when the port is down or congested for the following reasons:
|
Symbol Errors |
Total number of minor link errors detected on one or more physical lanes. |
Link Error Recovery |
Total number of times the Port Training state machine has successfully completed the link error recovery process. |
Link Error Downed |
Total number of times the Port Training state machine has failed the link error recovery process and downed the link. |
Local Integrity Error |
The number of times that the count of local physical errors exceeded the threshold specified by LocalPhyErrors |
Rcv Remote Physical Error |
Total number of packets marked with the EBP delimiter received on the port. |
Xmit Constraint Error |
Total number of packets not transmitted from the switch physical port for the following reasons:
|
Rcv Constraint Error |
Total number of packets received on the switch physical port that are discarded for the following reasons:
|
Excess Buffer Overrun Error |
The number of times that OverrunErrors consecutive flow control update periods occurred, each having at least one overrun error |
Rcv Switch Relay Error |
Total number of packets received on the port that were discarded when they could not be forwarded by the switch relay for the following reasons:
|
VL15 Dropped |
Number of incoming VL15 packets dropped because of resource limitations (e.g., lack of buffers) in the port |
XmitWait |
The number of ticks during which the port selected by PortSelect had data to transmit but no data was sent during the entire tick because of insufficient credits or of lack of arbitration. |
InfiniBand Calculated Port Counters |
|
Counter |
Description |
Normalized XmitData |
Effective port bandwidth utilization in % |
Normalized Congested Bandwidth |
Amount of bandwidth that was suppressed due to congestion |
Device events are listed as VDM or CDM in the Source column of the Events table in the UFM GUI. For information about defining event policy, see Configuring Event Management.
Alarm ID |
Alarm Name |
To Log |
Alarm |
Default Severity |
Default Threshold |
Default TTL |
Related Object |
Category |
Description/Message |
64 |
GID Address In Service |
1 |
0 |
Info |
1 |
300 |
Port |
Fabric Notification |
|
65 |
GID Address Out of Service |
1 |
0 |
Warning |
1 |
300 |
Port |
Fabric Notification |
|
66 |
New MCast Group Created |
1 |
0 |
Info |
1 |
300 |
Port |
Fabric Notification |
|
67 |
MCast Group Deleted |
1 |
0 |
Info |
1 |
300 |
Port |
Fabric Notification |
|
110 |
Symbol Error |
1 |
1 |
Warning |
200 |
300 |
Port |
Hardware |
|
111 |
Link Error Recovery |
1 |
1 |
Minor |
1 |
300 |
Port |
Hardware |
|
112 |
Link Downed |
1 |
1 |
Critical |
1 |
300 |
Port |
Hardware |
|
113 |
Port Receive Errors |
1 |
1 |
Minor |
5 |
300 |
Port |
Hardware |
|
114 |
Port Receive Remote Physical Errors |
0 |
0 |
Minor |
5 |
300 |
Port |
Hardware |
|
115 |
Port Receive Switch Relay Errors |
1 |
1 |
Minor |
999 |
300 |
Port |
Fabric Configuration |
|
116 |
Port Xmit Discards |
1 |
1 |
Minor |
200 |
300 |
Port |
Communication Error |
|
117 |
Port Xmit Constraint Errors |
1 |
1 |
Minor |
200 |
300 |
Port |
Communication Error |
|
118 |
Port Receive Constraint Errors |
1 |
1 |
Minor |
200 |
300 |
Port |
Communication Error |
|
119 |
Local Link Integrity Errors |
1 |
1 |
Minor |
5 |
300 |
Port |
Hardware |
|
120 |
Excessive Buffer Overrun Errors |
1 |
1 |
Minor |
100 |
300 |
Port |
Communication Error |
|
121 |
VL15 Dropped |
1 |
1 |
Minor |
50 |
300 |
Port |
Communication Error |
|
122 |
Congested Bandwidth (%) Threshold Reached |
1 |
1 |
Minor |
10 |
300 |
Port |
Hardware |
|
131 |
Non-optimal link width (1X instead of 4X) |
1 |
1 |
Minor |
1 |
0 |
Port |
Hardware |
|
132 |
Non-optimal link width (1X or 4X instead of 12X) |
1 |
1 |
Minor |
1 |
0 |
Port |
Hardware |
|
140 |
Excessive Buffer Overrun Threshold Reached |
1 |
0 |
Minor |
11 |
300 |
Port |
Hardware |
|
141 |
Flow Control Update Watchdog Timer Expired |
1 |
0 |
Warning |
1 |
300 |
Port |
Hardware |
|
144 |
Capability Mask Modified |
1 |
0 |
Info |
1 |
300 |
Port |
Fabric Notification |
|
145 |
System Image GUID changed |
1 |
0 |
Info |
1 |
300 |
Port |
Communication Error |
|
256 |
Bad M_Key |
1 |
0 |
Minor |
1 |
300 |
Port |
Fabric Configuration |
|
257 |
Bad P_Key |
1 |
0 |
Minor |
1 |
300 |
Port |
Fabric Configuration |
|
258 |
Bad Q_Key |
1 |
0 |
Minor |
1 |
300 |
Port |
Fabric Configuration |
|
259 |
Bad P_Key Switch External Port |
1 |
0 |
Critical |
1 |
300 |
Port |
Fabric Configuration |
|
301 |
Logical Server State Changed |
1 |
0 |
Info |
1 |
0 |
Logical Server |
Logical Model |
|
302 |
Logical Server State Change Failed |
1 |
0 |
Minor |
1 |
0 |
Logical Server |
Logical Model |
|
306 |
Logical Server Added |
1 |
0 |
Info |
1 |
0 |
Logical Server |
Logical Model |
|
307 |
Logical Server Removed |
1 |
0 |
Info |
1 |
0 |
Logical Server |
Logical Model |
|
308 |
Logical Server Resources Allocated |
1 |
0 |
Info |
1 |
0 |
Logical Server |
Logical Model |
|
312 |
Compute Resource Released |
1 |
0 |
Info |
1 |
0 |
Logical Server |
Logical Model |
|
313 |
Compute Resource Allocated |
1 |
0 |
Info |
1 |
0 |
Logical Server |
Logical Model |
|
314 |
Logical Server Additional Resources Allocated |
1 |
0 |
Info |
1 |
0 |
Logical Server |
Logical Model |
|
315 |
Logical Server Resources Released |
1 |
0 |
Info |
1 |
0 |
Logical Server |
Logical Model |
|
316 |
Logical Server Compute Resource is Down |
1 |
1 |
Critical |
1 |
0 |
Logical Server |
Logical Model |
|
317 |
Logical Server Compute Resource is Up |
1 |
1 |
Warning |
1 |
0 |
Logical Server |
Logical Model |
|
328 |
Link is Up |
1 |
0 |
Info |
1 |
0 |
Link |
Fabric Topology |
|
328 |
Link is Down |
1 |
0 |
Warning |
1 |
0 |
Link |
Fabric Topology |
|
331 |
Node is Down |
1 |
0 |
Warning |
1 |
0 |
Site |
Fabric Topology |
|
332 |
Node is Up |
1 |
0 |
Info |
1 |
300 |
Site |
Fabric Topology |
|
336 |
Port Action Succeeded |
1 |
0 |
Info |
1 |
0 |
Port |
Maintenance |
|
337 |
Port Action Failed |
1 |
0 |
Minor |
1 |
0 |
Port |
Maintenance |
|
338 |
Device Action Succeeded |
1 |
0 |
Info |
1 |
0 |
Port |
Maintenance |
|
339 |
Device Action Failed |
1 |
0 |
Minor |
1 |
0 |
Port |
Maintenance |
|
340 |
Network Interface Added |
1 |
0 |
Info |
1 |
0 |
Logical Server |
Logical Model |
|
341 |
Network Interface Removed |
1 |
0 |
Info |
1 |
0 |
Logical Server |
Logical Model |
|
350 |
Environment Added |
1 |
0 |
Info |
1 |
0 |
Env |
Logical Model |
|
351 |
Environment Removed |
1 |
0 |
Info |
1 |
0 |
Env |
Logical Model |
|
352 |
Network Added |
1 |
0 |
Info |
1 |
0 |
Network |
Logical Model |
|
353 |
Network Removed |
1 |
0 |
Info |
1 |
0 |
Network |
Logical Model |
|
370 |
Gateway Ethernet Link State Changed |
1 |
0 |
Warning |
1 |
0 |
Gateway |
Gateway |
|
371 |
Gateway Reregister Event Received |
1 |
0 |
Warning |
1 |
0 |
Gateway |
Gateway |
|
372 |
Number of Gateways Changed |
1 |
0 |
Warning |
1 |
0 |
Gateway |
Gateway |
|
373 |
Gateway will be Rebooted |
1 |
0 |
Warning |
1 |
0 |
Gateway |
Gateway |
|
374 |
Gateway Reloading Finished |
1 |
0 |
Info |
1 |
0 |
Gateway |
Gateway |
|
381 |
Switch Upgrade Failed |
1 |
0 |
Info |
1 |
0 |
Switch |
Maintenance |
|
383 |
Host Upgrade Failed |
1 |
0 |
Info |
1 |
0 |
Computer |
Maintenance |
|
385 |
Switch FW Upgrade Started |
1 |
0 |
Info |
1 |
0 |
Switch |
Maintenance |
|
386 |
Switch SW Upgrade Started |
1 |
0 |
Info |
1 |
0 |
Switch |
Maintenance |
|
388 |
Host FW Upgrade Started |
1 |
0 |
Info |
1 |
0 |
Computer |
Maintenance |
|
389 |
Host SW Upgrade Started |
1 |
0 |
Info |
1 |
0 |
Computer |
Maintenance |
|
391 |
Switch Module Removed |
1 |
0 |
Info |
1 |
0 |
Switch |
Fabric Notification |
|
392 |
Module Temperature Threshold Reached |
1 |
0 |
Info |
40 |
0 |
Module |
Hardware |
|
394 |
Module Status FAULT |
1 |
1 |
Critical |
1 |
420 |
Switch |
Module Status |
|
502 |
Device Upgrade Finished |
1 |
0 |
Info |
1 |
300 |
Device |
Maintenance |
|
545 |
SM is not responding |
1 |
1 |
Critical |
1 |
300 |
Grid |
Maintenance |
|
602 |
UFM Server Failover |
1 |
1 |
Critical |
1 |
0 |
Site |
Fabric Notification |
|
701 |
Non-optimal Link Speed |
1 |
1 |
Minor |
1 |
0 |
Port |
Hardware |
|
907 |
Switch is Down |
1 |
1 |
Critical |
1 |
0 |
Site |
Fabric Topology |
|
908 |
Switch is Up |
1 |
1 |
Info |
1 |
300 |
Site |
Fabric Topology |
|
909 |
Director Switch is Down |
1 |
1 |
Critical |
1 |
300 |
Site |
Fabric Topology |
|
910 |
Director Switch is Up |
1 |
1 |
Info |
1 |
0 |
Site |
Fabric Topology |
|
911 |
Module Temperature Low Threshold Reached |
1 |
1 |
Warning |
60 |
300 |
Module |
Hardware |
|
912 |
Module Temperature High Threshold Reached |
1 |
1 |
Critical |
60 |
300 |
Module |
Hardware |
|
913 |
Module High Voltage |
1 |
1 |
Warning |
10 |
420 |
Switch |
Module Status |
|
914 |
Module High Current |
1 |
1 |
Warning |
10 |
420 |
Switch |
Module Status |
|
915 |
BER_ERROR |
1 |
1 |
Critical |
1e-8 |
420 |
Port |
Hardware |
|
916 |
BER_WARNING |
1 |
1 |
Warning |
1e-13 |
420 |
Port |
Hardware |
|
917 |
SYMBOL_BER_ERROR |
1 |
1 |
Critical |
420 |
Port |
Hardware |
||
1300 |
SM_SAKEY_VIOLATION |
1 |
1 |
Warning |
5300 |
Port |
Fabric Notification |
||
1301 |
SM_SGID_SPOOFED |
1 |
1 |
Warning |
5300 |
Port |
Fabric Notification |
||
1302 |
SM_RATE_LIMIT_EXCEEDED |
1 |
1 |
Warning |
5300 |
Port |
Fabric Notification |
||
1303 |
SM_MULTICAST_GROUPS_LIMIT_EXCEEDED |
1 |
1 |
Warning |
5300 |
Port |
Fabric Notification |
||
1304 |
SM_SERVICES_LIMIT_EXCEEDED |
1 |
1 |
Warning |
5300 |
Port |
Fabric Notification |
||
1305 |
SM_EVENT_SUBSCRIPTION_LIMIT_EXCEEDED |
1 |
1 |
Warning |
5300 |
Port |
Fabric Notification |
||
1500 |
New cable detected |
1 |
0 |
Info |
1 |
0 |
Link |
Hardware |
|
1502 |
Cable detected in a new location |
1 |
0 |
Warning |
1 |
0 |
Link |
Hardware |
|
1503 |
Duplicate Cable Detected |
1 |
0 |
Critical |
1 |
0 |
Link |
Hardware |
For a list of AHX related events, please refer to "AHX Monitoring Events".